Adel Abu Hashim - Oct 2021
This case study aims to help Amber Heard
By analyzing new accounts posting/ commenting against a victim of a Social Bot Disinformation/Influence Operation.
We have three main datasets:
(The datasets screaped from reddit).
- 1- A dataset with submissions & comments data (2018).
- 2- Users Data (from 2006 to 2018).
- 3- A merged dataset (submissions & comments data, users data).
- 4- Daily creation data (# of accounts created per day from 2006 to 2018)
#import dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import helpers
import matplotlib.dates as mdates
import plotly.express as px
import plotly.graph_objects as go
import re
import warnings
warnings.filterwarnings('ignore')
sb.set_style("darkgrid")
%matplotlib inline
# load data
df = pd.read_csv("cleaned_data/reddit_cleaned_2018.csv")
df_merged = pd.read_csv("cleaned_data/reddit_merged_2018.csv")
df_users = pd.read_csv("cleaned_data/users_cleaned.csv")
# convert to datetime
df.created_at = pd.to_datetime(df.created_at)
df_merged.created_at = pd.to_datetime(df_merged.created_at)
df_merged.user_created_at = pd.to_datetime(df_merged.user_created_at)
df_users.user_created_at = pd.to_datetime(df_users.user_created_at)
Reddit Contributions (Comments / Submissions)¶
px.bar(data_frame=df['submission_comment'].value_counts().to_frame().reset_index(),
x="index", y="submission_comment", color='submission_comment').update_layout(title='Comment or Submission',
xaxis_title='contribution category',
yaxis_title='number of contributions').update_traces(marker_color='#5296dd')
df.author.value_counts().to_frame().head(10).reset_index()
| index | author | |
|---|---|---|
| 0 | -banned- | 1666 |
| 1 | Night_Chicken | 135 |
| 2 | AutoModerator | 50 |
| 3 | emilyguy | 38 |
| 4 | AutoNewsAdmin | 35 |
| 5 | ccrraapp | 34 |
| 6 | Rednaxela117 | 33 |
| 7 | AutoNewspaperAdmin | 33 |
| 8 | ZorakLocust | 27 |
| 9 | tenchineuro | 22 |
fig = px.bar(df.author.value_counts().to_frame().head(10).reset_index(), x="author", y="index",
height=500,
title='Most commented user in 2018').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='number of comments',
yaxis_title='user name').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
df.author.value_counts().to_frame().head(10).reset_index()
| index | author | |
|---|---|---|
| 0 | -banned- | 1666 |
| 1 | Night_Chicken | 135 |
| 2 | AutoModerator | 50 |
| 3 | emilyguy | 38 |
| 4 | AutoNewsAdmin | 35 |
| 5 | ccrraapp | 34 |
| 6 | Rednaxela117 | 33 |
| 7 | AutoNewspaperAdmin | 33 |
| 8 | ZorakLocust | 27 |
| 9 | tenchineuro | 22 |
AutoModerator is a system built into reddit that allows moderators to define "rules" (consisting of checks and actions) to be automatically applied to posts in their subreddit.
df_auto_moderator = df.query(" author == 'AutoModerator' ").reset_index(drop=True)
print(df_auto_moderator.shape)
df_auto_moderator.head(1)
(50, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_dsmgq7k | /r/SubredditDrama/comments/7q5eif/rharrypotter... | We're sorry, but accounts under 14 days old ma... | t3_7q5eif | r/SubredditDrama | AutoModerator | 2018-01-13 16:30:08 | Negative | Positive | 1.0 | submission | comment | 82 | rharrypotter_duel_on_the_topic_of_domestic_abuse | 8 | [] | 0 |
df_auto_moderator.subreddit.value_counts().head(10)
r/DC_Cinematic 12 r/JerkOffToCelebs 6 r/AskReddit 4 r/worldnews 4 r/Celebs 3 r/youtube 2 r/unpopularopinion 2 r/videos 2 r/politics 1 r/photoshopbattles 1 Name: subreddit, dtype: int64
df_auto_moderator['permalink'].iloc[26]
'/r/worldnews/comments/9ymw4g/the_real_reason_amber_heard_hesitated_to_take_her/ea2jaoy/'
df_auto_moderator.text.value_counts().head();
df_night_chicken = df.query(" author == 'Night_Chicken' ").reset_index(drop=True)
print(df_night_chicken.shape)
df_night_chicken.head()
(135, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_dslw4u6 | /r/Celebs/comments/7pysyp/amber_heard/dslw4u6/ | What? What did she hear? | t3_7pysyp | r/Celebs | Night_Chicken | 2018-01-13 04:48:41 | Neutral | Neutral | -6.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 1 | t1_dvufzj9 | /r/Celebs/comments/844rw7/amber_heard/dvufzj9/ | What? Thank you! | t1_dvn0nfy | r/Celebs | Night_Chicken | 2018-03-17 13:22:00 | Neutral | Positive | 1.0 | comment | comment | 3 | amber_heard | 2 | [] | 0 |
| 2 | t1_dvug20l | /r/Celebs/comments/81y6lx/amber_heard/dvug20l/ | What? What did she hear? | t3_81y6lx | r/Celebs | Night_Chicken | 2018-03-17 13:23:50 | Neutral | Neutral | 1.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 3 | t1_e3uftxn | /r/Celebs/comments/94w2gm/amber_heard/e3uftxn/ | What? What did she hear? | t3_94w2gm | r/Celebs | Night_Chicken | 2018-08-08 20:07:11 | Neutral | Neutral | 1.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 4 | t1_e3ufvla | /r/Celebs/comments/94tmn2/amber_heard/e3ufvla/ | What? What did she hear? | t3_94tmn2 | r/Celebs | Night_Chicken | 2018-08-08 20:07:50 | Neutral | Neutral | 1.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
df_night_chicken['permalink'].iloc[0]
'/r/Celebs/comments/7pysyp/amber_heard/dslw4u6/'
df_night_chicken.subreddit.value_counts().head(10)
r/Celebs 135 Name: subreddit, dtype: int64
df_night_chicken.text.value_counts().head()
What? What did she hear? 121 What? What did she hear? 4 I want to know as well. 1 What? Thank you! 1 Excellent! Not as exciting as the crunching and jostling sounds in Elon's full pockets which escaped her grasp. 1 Name: text, dtype: int64
df_emily = df.query(" author == 'emilyguy' ").reset_index(drop=True)
print(df_emily.shape)
df_emily.head()
(38, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t3_82wa5c | /r/gentlemanboners/comments/82wa5c/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | emilyguy | 2018-03-08 09:30:44 | Neutral | Neutral | 3814.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 1 | t3_842r8g | /r/gentlemanboners/comments/842r8g/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | emilyguy | 2018-03-13 09:09:02 | Neutral | Neutral | 442.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 2 | t3_85j0o6 | /r/gentlemanboners/comments/85j0o6/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | emilyguy | 2018-03-19 12:18:13 | Neutral | Neutral | 595.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 3 | t3_8694b7 | /r/WatchItForThePlot/comments/8694b7/amber_hea... | Amber Heard has an amazing body | NaN | r/WatchItForThePlot | emilyguy | 2018-03-22 05:16:10 | Positive | Positive | 3165.0 | NaN | submission | 6 | amber_heard_has_an_amazing_body | 6 | [] | 0 |
| 4 | t3_8694cl | /r/celebsnaked/comments/8694cl/amber_heard/ | Amber Heard | NaN | r/celebsnaked | emilyguy | 2018-03-22 05:16:21 | Neutral | Neutral | 316.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
df_emily.submission_comment.value_counts()
submission 34 comment 4 Name: submission_comment, dtype: int64
df_emily['permalink'].iloc[0]
'/r/gentlemanboners/comments/82wa5c/amber_heard/'
df_emily.subreddit.value_counts().head(10)
r/gentlemanboners 19 r/Celebs 9 r/celebsnaked 4 r/celebnsfw 3 r/WatchItForThePlot 2 r/CaraDelevingne 1 Name: subreddit, dtype: int64
df_emily.text.value_counts().head(10)
Amber Heard 31 The Informers (2008) 2 with Amber Heard 1 The Informers 1 The Playboy Club (2011) 1 Amber Heard and Jessica Alba 1 Amber Heard has an amazing body 1 Name: text, dtype: int64
df_users[df_users.user_name == 'ccrraapp']
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 5162 | ccrraapp | True | True | False | False | 34499.0 | 114824.0 | 2012-03-11 19:58:27 | others | others |
df_crap = df.query(" author == 'ccrraapp' ").reset_index(drop=True)
print(df_crap.shape)
df_crap.head()
(34, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_e0pfa56 | /r/DCEUboners/comments/8r6fjx/amber_heard/e0pf... | Oh wow! | t3_8r6fjx | r/DCEUboners | ccrraapp | 2018-06-15 07:40:04 | Positive | Positive | 2.0 | submission | comment | 2 | amber_heard | 2 | [] | 0 |
| 1 | t1_e0pfbb5 | /r/DCEUboners/comments/8r6bq6/nicole_kidman_am... | Nicole Kidman looks stunning. At this point on... | t3_8r6bq6 | r/DCEUboners | ccrraapp | 2018-06-15 07:41:12 | Positive | Positive | 1.0 | submission | comment | 15 | nicole_kidman_amber_heard | 4 | [] | 0 |
| 2 | t1_e0x3uz0 | /r/DCEUboners/comments/8s4e6n/amber_heard/e0x3... | Her dress is so perfectly wrapping her. | t3_8s4e6n | r/DCEUboners | ccrraapp | 2018-06-19 09:09:46 | Positive | Positive | 1.0 | submission | comment | 7 | amber_heard | 2 | [] | 0 |
| 3 | t1_e2pq776 | /r/DCEUboners/comments/90dxcj/amber_heard/e2pq... | Did she lose a lot of weight for this role? T... | t3_90dxcj | r/DCEUboners | ccrraapp | 2018-07-20 08:23:49 | Positive | Neutral | 2.0 | submission | comment | 15 | amber_heard | 2 | [] | 0 |
| 4 | t1_e2psowf | /r/DCEUboners/comments/90dxcj/amber_heard/e2ps... | Yes but she looks slim than before. | t1_e2prpct | r/DCEUboners | ccrraapp | 2018-07-20 09:51:55 | Neutral | Positive | 2.0 | comment | comment | 7 | amber_heard | 2 | [] | 0 |
df_crap.subreddit.value_counts().head(10)
r/geekboners 19 r/DCEUboners 14 r/gentlemanboners 1 Name: subreddit, dtype: int64
df_crap.submission_comment.value_counts()
submission 27 comment 7 Name: submission_comment, dtype: int64
df_crap.text.value_counts().head(10)
[Aquaman] Amber Heard 16 Amber Heard 10 Idk what to tell you but she slimmed down a bit and is more athletic shaped since she started for Aquaman. Maybe thats why you don't see the old her. 1 Did she lose a lot of weight for this role? To fit in that suit? 1 Oh wow! 1 Yes but she looks slim than before. 1 Her dress is so perfectly wrapping her. 1 Nicole Kidman looks stunning. At this point one has to wonder is she even ageing? 1 She is. Here is [another one](https://i.imgur.com/QlmM5RS.jpg)\n\n 1 [Warcraft] Paula Patton [Aquaman] Amber Heard [Sin City] Jessica Alba 1 Name: text, dtype: int64
df_crap_contributions = df_crap.groupby(df_crap.created_at.dt.date).size().reset_index(name='n_contributions')
df_crap_contributions.sort_values('n_contributions', ascending=False).head(10)
| created_at | n_contributions | |
|---|---|---|
| 17 | 2018-12-06 | 3 |
| 4 | 2018-09-10 | 3 |
| 0 | 2018-06-15 | 2 |
| 9 | 2018-11-02 | 2 |
| 18 | 2018-12-09 | 2 |
| 15 | 2018-11-28 | 2 |
| 14 | 2018-11-27 | 2 |
| 13 | 2018-11-20 | 2 |
| 11 | 2018-11-13 | 2 |
| 10 | 2018-11-08 | 2 |
# check the date this account was creted
df_users[df_users.user_name == 'AutoNewspaperAdmin']
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 25721 | AutoNewspaperAdmin | True | True | False | False | 1.0 | 249122.0 | 2016-10-29 06:43:28 | others | others |
df_auto = df.query(" author == 'AutoNewspaperAdmin' ").reset_index(drop=True)
print(df_auto.shape)
df_auto.head()
(33, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t3_8a394f | /r/AutoNewspaper/comments/8a394f/entertainment... | [Entertainment] - Amber Heard says meeting Syr... | NaN | r/AutoNewspaper | AutoNewspaperAdmin | 2018-04-05 20:31:39 | Neutral | Neutral | 1.0 | NaN | submission | 14 | entertainment_amber_heard_says_meeting_syria | 6 | [] | 0 |
| 1 | t3_8a3i6o | /r/AutoNewspaper/comments/8a3i6o/entertainment... | [Entertainment] - Amber Heard says meeting Syr... | NaN | r/AutoNewspaper | AutoNewspaperAdmin | 2018-04-05 21:01:37 | Neutral | Neutral | 1.0 | NaN | submission | 14 | entertainment_amber_heard_says_meeting_syria | 6 | [] | 0 |
| 2 | t3_8jp0vd | /r/AutoNewspaper/comments/8jp0vd/lifestyle_in_... | [Lifestyle] - In Cannes, Amber Heard chats abo... | NaN | r/AutoNewspaper | AutoNewspaperAdmin | 2018-05-15 20:47:12 | Neutral | Neutral | 1.0 | NaN | submission | 16 | lifestyle_in_cannes_amber_heard_chats_about_big | 8 | [] | 0 |
| 3 | t3_8ttimo | /r/AutoNewspaper/comments/8ttimo/national_ambe... | [National] - Amber Heard, other celebs travel ... | NaN | r/AutoNewspaper | AutoNewspaperAdmin | 2018-06-25 19:46:01 | Negative | Negative | 1.0 | NaN | submission | 17 | national_amber_heard_other_celebs_travel_to_texas | 8 | [] | 0 |
| 4 | t3_8vur4r | /r/AutoNewspaper/comments/8vur4r/video_amber_h... | [Video] - Amber Heard upsets fans with offensi... | NaN | r/AutoNewspaper | AutoNewspaperAdmin | 2018-07-03 18:48:38 | Neutral | Negative | 1.0 | NaN | submission | 11 | video_amber_heard_upsets_fans_with_offensive | 7 | [] | 0 |
df_auto.subreddit.value_counts().head(10)
r/AutoNewspaper 33 Name: subreddit, dtype: int64
df_auto.text.value_counts().head(10)
[Entertainment] - Amber Heard says she is happy to have moved on with her life | ABC 2 [Entertainment] - Amber Heard: Time's Up movement 'has made incredible gains' | USA Today 1 [Entertainment] - Amber Heard says meeting Syria refugees left indelible mark | Miami Herald 1 [Entertainment] - WATCH: Amber Heard dishes on Jason Momoa's pranks on 'Aquaman' set | ABC 1 [Entertainment] - Amber Heard: Style Diary | USA Today 1 [Entertainment] - Janelle Monae, Hillary Clinton, Amber Heard hit the 2018 Glamour Women of the Year Awards | USA Today 1 [Video] - Amber Heard talks superhero roles for women | FOX 1 [Entertainment] - A couture swim cap? Amber Heard rocks an eye-popping headpiece at 'Aquaman' premiere | USA Today 1 [Lifestyle] - Scene City: Bill Clinton, Tony Blair and Amber Heard Made Calls for Charity | NY Times 1 [Entertainment] - Amber Heard says women face 'skepticism, hostility' when speaking out about abuse | USA Today 1 Name: text, dtype: int64
df_auto = df_auto.groupby(df_auto.created_at.dt.date).size().reset_index(name='n_contributions')
df_auto
| created_at | n_contributions | |
|---|---|---|
| 0 | 2018-04-05 | 2 |
| 1 | 2018-05-15 | 1 |
| 2 | 2018-06-25 | 1 |
| 3 | 2018-07-03 | 3 |
| 4 | 2018-09-14 | 1 |
| 5 | 2018-10-03 | 2 |
| 6 | 2018-10-10 | 1 |
| 7 | 2018-10-22 | 4 |
| 8 | 2018-10-25 | 5 |
| 9 | 2018-10-29 | 1 |
| 10 | 2018-11-13 | 1 |
| 11 | 2018-11-26 | 1 |
| 12 | 2018-11-27 | 2 |
| 13 | 2018-12-03 | 1 |
| 14 | 2018-12-05 | 2 |
| 15 | 2018-12-06 | 2 |
| 16 | 2018-12-12 | 1 |
| 17 | 2018-12-19 | 2 |
df_rad = df.query(" author == 'Rednaxela117' ").reset_index(drop=True)
print(df_rad.shape)
df_rad.head()
(33, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t3_7zgadd | /r/SexyWomanOfTheDay/comments/7zgadd/amber_hea... | Amber Heard: easy on the eyes | NaN | r/SexyWomanOfTheDay | Rednaxela117 | 2018-02-22 16:47:57 | Positive | Positive | 73.0 | NaN | submission | 6 | amber_heard_easy_on_the_eyes | 6 | [] | 0 |
| 1 | t3_7zgcdi | /r/SexyWomanOfTheDay/comments/7zgcdi/amber_hea... | Amber Heard: interesting swimwear | NaN | r/SexyWomanOfTheDay | Rednaxela117 | 2018-02-22 16:54:44 | Positive | Positive | 72.0 | NaN | submission | 4 | amber_heard_interesting_swimwear | 4 | [] | 0 |
| 2 | t1_dunqfh1 | /r/SexyWomanOfTheDay/comments/7zg975/amber_hea... | Apparently she's the most beautiful woman in t... | t3_7zg975 | r/SexyWomanOfTheDay | Rednaxela117 | 2018-02-22 16:57:06 | Positive | Positive | 2.0 | submission | comment | 13 | amber_heard_is_todays_sexy_woman_of_the_day | 9 | ['https://www.maxim.com'] | 1 |
| 3 | t3_7zgdf3 | /r/SexyWomanOfTheDay/comments/7zgdf3/amber_hea... | Amber Heard: as Mera | NaN | r/SexyWomanOfTheDay | Rednaxela117 | 2018-02-22 16:58:11 | Neutral | Neutral | 114.0 | NaN | submission | 4 | amber_heard_as_mera | 4 | [] | 0 |
| 4 | t3_7zgh46 | /r/SexyWomanOfTheDay/comments/7zgh46/amber_hea... | Amber Heard: lounging around | NaN | r/SexyWomanOfTheDay | Rednaxela117 | 2018-02-22 17:11:13 | Neutral | Neutral | 63.0 | NaN | submission | 4 | amber_heard_lounging_around | 4 | [] | 0 |
df_rad.subreddit.value_counts().head(10)
r/Celebs 11 r/SexyWomanOfTheDay 10 r/gentlemanboners 8 r/PrettyGirls 2 r/goddesses 1 r/celebritylegs 1 Name: subreddit, dtype: int64
df_rad['permalink'].iloc[5]
'/r/SexyWomanOfTheDay/comments/7zgozg/amber_heard_legs_for_days/'
df_rad.text.value_counts().head(10)
Amber Heard 18 Crazy, not dumb. 1 Don't we all haha 1 Amber Heard: lounging around 1 Amber Heard: easy on the eyes 1 She's as hot as she is crazy haha 1 Red heads, you gotta love them. 1 Gorgeous with red hair in Aquaman 1 Amber Heard: as Mera 1 Amber Heard: business casual 1 Name: text, dtype: int64
df_rad_contributions = df_rad.groupby(df_rad.created_at.dt.date).size().reset_index(name='n_contributions')
df_rad_contributions.sort_values('n_contributions', ascending=False).head(10)
| created_at | n_contributions | |
|---|---|---|
| 0 | 2018-02-22 | 8 |
| 3 | 2018-05-01 | 8 |
| 1 | 2018-02-23 | 2 |
| 2 | 2018-03-28 | 2 |
| 4 | 2018-05-02 | 2 |
| 13 | 2018-12-29 | 2 |
| 5 | 2018-05-03 | 1 |
| 6 | 2018-05-25 | 1 |
| 7 | 2018-07-10 | 1 |
| 8 | 2018-07-22 | 1 |
Check wether the users with the most contributions are mod, gold or having a verified email¶
df.author.value_counts().nlargest(n=25)
-banned- 1666 Night_Chicken 135 AutoModerator 50 emilyguy 38 AutoNewsAdmin 35 ccrraapp 34 Rednaxela117 33 AutoNewspaperAdmin 33 ZorakLocust 27 tenchineuro 22 RuleIV 22 jeff98379 21 vonmark955 20 InfiniTitans 20 Chronos2016 20 bundt_trundler 18 ZadocPaet 18 nobodycares65 18 Queen1110 17 MightUlt-7 16 Count_Fapula1 16 FlexOutlaw 15 sagar7854 15 NaveHarder 14 AngelaStettner69 13 Name: author, dtype: int64
check_list = df.author.value_counts().nlargest(n=25).index.tolist()[1:]
check_list
['Night_Chicken', 'AutoModerator', 'emilyguy', 'AutoNewsAdmin', 'ccrraapp', 'Rednaxela117', 'AutoNewspaperAdmin', 'ZorakLocust', 'tenchineuro', 'RuleIV', 'jeff98379', 'vonmark955', 'InfiniTitans', 'Chronos2016', 'bundt_trundler', 'ZadocPaet', 'nobodycares65', 'Queen1110', 'MightUlt-7', 'Count_Fapula1', 'FlexOutlaw', 'sagar7854', 'NaveHarder', 'AngelaStettner69']
# get a data frame with the most negative-comments users
df_check = df_users[df_users['user_name'].isin(check_list)]
print(df_check.shape)
df_check.head(2)
(24, 10)
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1520 | FlexOutlaw | True | False | False | False | 5217.0 | 154864.0 | 2010-09-25 18:36:46 | others | others |
| 4432 | AutoModerator | True | True | True | False | 1000.0 | 1000.0 | 2012-01-05 05:24:28 | others | others |
df_check['user_name'].nunique()
24
def get_stats(df):
for col in df.columns:
if col not in ['user_name', 'user_created_at']:
if col not in ['link_karma', 'comment_karma']:
print('The value counts of the users with the most contributions: ' + col)
print(df_check[col].value_counts())
print('\n')
else:
print("The min of {}".format(col), round(df_check[col].min(),2))
print('\n')
print("The max of {}".format(col), round(df_check[col].max(),2))
print('\n')
print("The mean of {}".format(col), round(df_check[col].mean(),2))
print('\n')
print("The median of {}".format(col), round(df_check[col].mean(),2))
print('\n')
get_stats(df_check)
The value counts of the users with the most contributions: has_verified_email True 23 False 1 Name: has_verified_email, dtype: int64 The value counts of the users with the most contributions: is_mod True 14 False 10 Name: is_mod, dtype: int64 The value counts of the users with the most contributions: is_gold False 19 True 5 Name: is_gold, dtype: int64 The value counts of the users with the most contributions: is_banned False 22 True 2 Name: is_banned, dtype: int64 The min of comment_karma -1.0 The max of comment_karma 280938.0 The mean of comment_karma 32688.95 The median of comment_karma 32688.95 The min of link_karma 860.0 The max of link_karma 2838485.0 The mean of link_karma 386394.59 The median of link_karma 386394.59 The value counts of the users with the most contributions: banned_unverified others 21 banned 2 unverified 1 Name: banned_unverified, dtype: int64 The value counts of the users with the most contributions: creation_year others 18 2018 4 banned 2 Name: creation_year, dtype: int64
# pd.set_option('display.max_colwidth', None)
suspected_dict = {}
df_fuc = df[df.text.str.lower().str.contains('fuck')]
print(df_fuc.shape)
df_fuc.head()
(225, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | t1_ds0ylr9 | /r/JerkOffToCelebs/comments/7nbhsf/amber_heard... | Tits look primed to get fucked, though yeah, I... | t3_7nbhsf | r/JerkOffToCelebs | -banned- | 2018-01-01 04:44:31 | Negative | Neutral | 2.0 | submission | comment | 14 | amber_heard_would_be_such_a_dirty_slut_in_bed | 10 | [] | 0 |
| 6 | t1_ds1ghu7 | /r/elonmusk/comments/7n76bc/amber_heard_and_el... | He's asking for it - he's living a soap opera,... | t1_ds0339s | r/elonmusk | -banned- | 2018-01-01 16:38:18 | Negative | Neutral | 2.0 | comment | comment | 81 | amber_heard_and_elon_musk_spotted_vacationing_in | 8 | [] | 0 |
| 13 | t1_ds2hdi8 | /r/elonmusk/comments/7mbv6q/amber_heard_and_el... | She's a fucking spider. | t3_7mbv6q | r/elonmusk | BoracayBatCave | 2018-01-02 05:29:24 | Negative | Neutral | 2.0 | submission | comment | 4 | amber_heard_and_elon_musk_are_reportedly_back | 8 | [] | 0 |
| 21 | t1_ds4unfi | /r/gentlemanboners/comments/7nolyx/amber_heard... | Damn Elon you fucked up | t3_7nolyx | r/gentlemanboners | -banned- | 2018-01-03 17:01:17 | Negative | Negative | 1.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 35 | t3_7ohkn2 | /r/u_NeverOwnedAnyone/comments/7ohkn2/while_el... | While Elin Musk is riding high on his free pat... | NaN | u/NeverOwnedAnyone | -banned- | 2018-01-06 06:14:59 | Positive | Positive | 1.0 | NaN | submission | 29 | u_NeverOwnedAnyone | 2 | [] | 0 |
# get the authors of these submissions having 'fuck_amber_heard'
mask = (df['submission_text'] == 'fuck_amber_heard') & (df['submission_comment'] == 'submission')
df_sub = df[mask]
print(df_sub.shape)
with pd.option_context('display.max_colwidth', None):
display(df_sub.head())
(0, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count |
|---|
# get the authors of these submissions having the same submission text
mask = (df['submission_text'].str.contains('fuck')) & (df['submission_comment'] == 'submission')
df_sub_fuc = df[mask]
print(df_sub_fuc.shape)
with pd.option_context('display.max_colwidth', None):
display(df_sub_fuc.head())
(15, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 107 | t3_7rq9ap | /r/dirtypenpals/comments/7rq9ap/22f4a_anyone_with_a_fucked_up_mind_have_a_crush/ | 22f4a - Anyone with a fucked up mind have a crush on Amber Heard, Lexi Belle, Lauren Cohan, Shay Mitchell? | NaN | r/dirtypenpals | -banned- | 2018-01-20 13:02:08 | Negative | Negative | 1.0 | NaN | submission | 20 | 22f4a_anyone_with_a_fucked_up_mind_have_a_crush | 10 | [] | 0 |
| 265 | t3_7wiqf8 | /r/JerkOffToCelebs/comments/7wiqf8/amber_heard_deserves_a_good_rough_fuck/ | Amber Heard deserves a good rough fuck. | NaN | r/JerkOffToCelebs | arsenalmat | 2018-02-10 02:40:27 | Positive | Neutral | 20.0 | NaN | submission | 7 | amber_heard_deserves_a_good_rough_fuck | 7 | [] | 0 |
| 433 | t3_82lvbs | /r/gentlemanboners/comments/82lvbs/amber_heard_left_fuck_it_id_still_tap_her/ | Amber heard (left): fuck it I’d still tap her!$!! | NaN | r/gentlemanboners | -banned- | 2018-03-07 05:21:31 | Negative | Negative | 0.0 | NaN | submission | 9 | amber_heard_left_fuck_it_id_still_tap_her | 9 | [] | 0 |
| 1055 | t3_8cflf9 | /r/JerkOffToCelebs/comments/8cflf9/amber_heard_is_ready_to_get_fucked_and_swallow_cum/ | Amber Heard is ready to get fucked and swallow cum | NaN | r/JerkOffToCelebs | -banned- | 2018-04-15 14:51:48 | Negative | Neutral | 21.0 | NaN | submission | 10 | amber_heard_is_ready_to_get_fucked_and_swallow_cum | 10 | [] | 0 |
| 1769 | t3_8p6csr | /r/JerkOffToCelebs/comments/8p6csr/amber_heard_is_so_fucking_hot/ | Amber Heard is so fucking hot | NaN | r/JerkOffToCelebs | arsenalmat | 2018-06-07 01:42:34 | Positive | Neutral | 34.0 | NaN | submission | 6 | amber_heard_is_so_fucking_hot | 6 | [] | 0 |
df_sub_fuc_contributions = df_sub_fuc.groupby(df_sub_fuc.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_sub_fuc_contributions,
x='created_at',
y='n_contributions', title='The number of submissions with the word "F*CK" in 2018')
fig.update_traces(marker_color='red', marker_line_width=.5, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
df_sub_fuc_contributions.sort_values('n_contributions', ascending=False)
| created_at | n_contributions | |
|---|---|---|
| 11 | 2018-12-28 | 3 |
| 9 | 2018-09-27 | 2 |
| 0 | 2018-01-20 | 1 |
| 1 | 2018-02-10 | 1 |
| 2 | 2018-03-07 | 1 |
| 3 | 2018-04-15 | 1 |
| 4 | 2018-06-07 | 1 |
| 5 | 2018-06-16 | 1 |
| 6 | 2018-06-28 | 1 |
| 7 | 2018-07-08 | 1 |
| 8 | 2018-07-22 | 1 |
| 10 | 2018-12-25 | 1 |
df_fuc.author.value_counts().head(10)
-banned- 65 Count_Fapula1 4 DariJC 3 TEDHARDYLEEN 2 TocTheElder 2 dirtydegrading 2 RedditZacuzzi 2 cornylamygilbert 2 chi_dist90 2 arsenalmat 2 Name: author, dtype: int64
df_fuc.submission_comment.value_counts()
comment 206 submission 19 Name: submission_comment, dtype: int64
df_fuc.subreddit.value_counts().head(10)
r/JerkOffToCelebs 54 r/gentlemanboners 18 r/WatchItForThePlot 17 r/movies 14 r/Celebs 12 r/celebJObuds 11 r/celebnsfw 10 r/MensRights 10 r/DC_Cinematic 9 r/goddesses 8 Name: subreddit, dtype: int64
df_fuc.created_at.dt.date.value_counts().head(10)
2018-12-19 14 2018-12-20 10 2018-05-02 5 2018-06-06 5 2018-08-15 5 2018-12-28 5 2018-08-11 5 2018-07-03 4 2018-11-19 4 2018-08-10 4 Name: created_at, dtype: int64
df_fuc_contributions = df_fuc.groupby(df_fuc.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_fuc_contributions,
x='created_at',
y='n_contributions', title='The number of "F*CK" contributions in 2018')
fig.update_traces(marker_color='red', marker_line_width=1, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
df_fuc_contributions.sort_values('n_contributions', ascending=False).head(10)
| created_at | n_contributions | |
|---|---|---|
| 96 | 2018-12-19 | 14 |
| 97 | 2018-12-20 | 10 |
| 24 | 2018-05-02 | 5 |
| 61 | 2018-08-15 | 5 |
| 103 | 2018-12-28 | 5 |
| 59 | 2018-08-11 | 5 |
| 35 | 2018-06-06 | 5 |
| 88 | 2018-11-19 | 4 |
| 27 | 2018-05-08 | 4 |
| 58 | 2018-08-10 | 4 |
Top users who used f*ck word
used the word f*ck 4 times
Negative Comments
df_count = df.query(" author == 'Count_Fapula1' ")
print(df_count.shape)
df_count.head()
(16, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5439 | t3_a37wmy | /r/celebJObuds/comments/a37wmy/need_a_bud_to_r... | Need a bud to RP as Amber Heard for me | NaN | r/celebJObuds | Count_Fapula1 | 2018-12-05 02:40:39 | Neutral | Neutral | 19.0 | NaN | submission | 10 | need_a_bud_to_rp_as_amber_heard_for_me | 10 | [] | 0 |
| 5440 | t3_a381td | /r/JerkOffToCelebs/comments/a381td/what_id_giv... | What I'd give to have Amber Heard worshipping ... | NaN | r/JerkOffToCelebs | Count_Fapula1 | 2018-12-05 02:57:38 | Neutral | Neutral | 52.0 | NaN | submission | 10 | what_id_give_to_have_amber_heard_worshipping_my | 9 | [] | 0 |
| 5463 | t3_a3fhl3 | /r/celebJObuds/comments/a3fhl3/still_looking_f... | Still looking for a bud to RP as Amber Heard a... | NaN | r/celebJObuds | Count_Fapula1 | 2018-12-05 19:14:12 | Neutral | Positive | 5.0 | NaN | submission | 14 | still_looking_for_a_bud_to_rp_as_amber_heard_and | 11 | [] | 0 |
| 5465 | t1_eb5q31k | /r/JerkOffToCelebs/comments/a381td/what_id_giv... | Right? She looks like the type that just *love... | t1_eb5pyp4 | r/JerkOffToCelebs | Count_Fapula1 | 2018-12-05 19:15:59 | Positive | Neutral | 4.0 | comment | comment | 11 | what_id_give_to_have_amber_heard_worshipping_my | 9 | [] | 0 |
| 5796 | t1_ebqzlr1 | /r/JerkOffToCelebs/comments/a5twvl/amber_heard... | Right? Sometimes I browse r/Amber_Heard and I'... | t3_a5twvl | r/JerkOffToCelebs | Count_Fapula1 | 2018-12-14 04:57:01 | Positive | Neutral | 3.0 | submission | comment | 11 | amber_heard_is_so_damn_hot | 6 | [] | 0 |
df_count.created_at.dt.date.value_counts()
2018-12-05 4 2018-12-20 4 2018-12-26 2 2018-12-18 2 2018-12-23 1 2018-12-14 1 2018-12-29 1 2018-12-24 1 Name: created_at, dtype: int64
df_count.text.value_counts().head(3)
Amber, because I’d rather fuck Scarlett’s sweet pussy. 1 I can’t wait to see it. God, she looks so good with that red hair 1 Still looking for a bud to RP as Amber Heard and help me cum 1 Name: text, dtype: int64
df_count[df_count.author == 'Count_Fapula1'].submission_comment.value_counts()
comment 13 submission 3 Name: submission_comment, dtype: int64
df_count[df_count.author == 'Count_Fapula1'].text.value_counts()
Amber, because I’d rather fuck Scarlett’s sweet pussy. 1 I can’t wait to see it. God, she looks so good with that red hair 1 Still looking for a bud to RP as Amber Heard and help me cum 1 Fuck, she's gorgeous. 1 Oh geez... I need to see this fucking movie. 1 Right? Sometimes I browse r/Amber_Heard and I'm totally smitten. Jesus Christ... 1 God, she looks so sexy in this suit... 1 Is she on top of me like she is in the gif? Because if that's the case, we're going full on cowgirl. 1 What I'd give to have Amber Heard worshipping my cock 1 Need a bud to RP as Amber Heard for me 1 Those lips 🤤 1 Me fuckin too, man 1 Right? She looks like the type that just *loves* sucking cock 1 God, she's so sexy 1 Imagine her in this suit with a ton of cum dripping off her chin and down her cleavage... 1 Amber is one of my favorites to cum for, you've been missing out... 1 Name: text, dtype: int64
used the word f*ck 3 times
Negative Comments
df_dari = df.query(" author == 'DariJC' ")
print(df_dari.shape)
df_dari.head()
(11, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 104 | t1_dsygyxg | /r/JerkOffToCelebs/comments/7roj2h/amber_heard... | She would be a really fun fuck... | t3_7roj2h | r/JerkOffToCelebs | DariJC | 2018-01-20 05:38:22 | Negative | Positive | 13.0 | submission | comment | 7 | amber_heardtight_and_wet | 4 | [] | 0 |
| 3888 | t1_e4xbxip | /r/JerkOffToCelebs/comments/9aqp5j/amber_heard... | Giving her a rough fuck would be incredibly fu... | t3_9aqp5j | r/JerkOffToCelebs | DariJC | 2018-08-27 17:03:46 | Negative | Neutral | 2.0 | submission | comment | 35 | amber_heard | 2 | [] | 0 |
| 4011 | t1_e52f18p | /r/JerkOffToCelebs/comments/9be7uy/amber_heard... | Needa tear off those panties and take her righ... | t3_9be7uy | r/JerkOffToCelebs | DariJC | 2018-08-29 23:42:09 | Positive | Neutral | 2.0 | submission | comment | 10 | amber_heard_is_freshening_up_before_the_next_r... | 9 | [] | 0 |
| 4145 | t1_e6rez2z | /r/celebJObuds/comments/9jh06j/id_fuck_the_shi... | Lightly push her forward and lift that dress u... | t3_9jh06j | r/celebJObuds | DariJC | 2018-09-27 22:06:24 | Positive | Neutral | 3.0 | submission | comment | 12 | id_fuck_the_shit_out_of_amber_heard_in_that | 10 | [] | 0 |
| 4148 | t1_e6rf3qo | /r/celebJObuds/comments/9jh06j/id_fuck_the_shi... | She def looks like she’s into the rough stuff | t1_e6rf0x6 | r/celebJObuds | DariJC | 2018-09-27 22:08:17 | Negative | Positive | 5.0 | comment | comment | 9 | id_fuck_the_shit_out_of_amber_heard_in_that | 10 | [] | 0 |
df_dari.created_at.dt.date.value_counts()
2018-11-19 2 2018-09-27 2 2018-11-04 2 2018-08-27 1 2018-11-14 1 2018-12-31 1 2018-08-29 1 2018-01-20 1 Name: created_at, dtype: int64
df_dari.text.value_counts().head(3)
Unzip her and have her bend over slightly, bracing against the window. Put on a show for everyone going by... 2 She def looks like she’s into the rough stuff 1 Giving her a rough fuck would be incredibly fun. Doggystyle, smacking her ass making her moan and scream. Seeing her lay on her stomach, breathing hard recovering afterwards, her blonde hair messily covering her face... 1 Name: text, dtype: int64
df_dari[df_dari.author == 'DariJC'].submission_comment.value_counts()
comment 11 Name: submission_comment, dtype: int64
df_dari[df_dari.author == 'DariJC'].text.value_counts()
Unzip her and have her bend over slightly, bracing against the window. Put on a show for everyone going by... 2 She def looks like she’s into the rough stuff 1 Giving her a rough fuck would be incredibly fun. Doggystyle, smacking her ass making her moan and scream. Seeing her lay on her stomach, breathing hard recovering afterwards, her blonde hair messily covering her face... 1 Needa tear off those panties and take her right there 1 She’d be an incredible fuck, especially looking like she does in this pic 1 She would be a really fun fuck... 1 Lightly push her forward and lift that dress up over her ass... 1 Very slutty look to her 1 She’d be great lol. Reminds me of Gwen Diamond in her IR scenes 1 Love the push up she’s got going on in this movie 1 Name: text, dtype: int64
df.text.value_counts().head(25)
Amber Heard 1045 [deleted] 171 What? What did she hear? 121 [removed] 52 [Aquaman] Amber Heard 47 Amber Heard - The Informers 20 Amber heard 14 amber heard 13 Your submission has been automatically removed for not including a valid category/subcategory tag. Tags are essential to an optimal browsing experience for our users.\n\nSince your post was removed automatically, you are free to resubmit it with an appropriate tag. You can find the tagging guide [here](/r/DC_Cinematic/wiki/linkflair#wiki_automated_tagging). Add a valid and appropriate tag in your submission title. Choose wisely, as posts with misleading tags are subject to removal.\n\n**Message the moderators if your post was removed despite being tagged with an input from the category list.**\n\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/DC_Cinematic) if you have any questions or concerns.* 12 The following links may contain NSFW material. Some links may not work correctly in mobile Reddit apps.\r\n\r\n---\r\n\r\n**Amber Heard**\r\n\r\n[Instagram](https://www.instagram.com/amberheard_)\r\n\r\n[Twitter](https://www.Twitter.com/realamberheard)\r\n\r\n[YouTube](https://www.youtube.com/channel/UCB2ySslhqSI7Vd2nx0UHvSw)\r\n\r\n[Facebook](https://www.Facebook.com/AmberHeardOfficial)\r\n\r\n/r/Amber_Heard\r\n\r\n[IMDB](https://www.IMDB.com/name/nm1720028)\r\n\r\n[Google Image Search](https://www.google.com/search?tbm=isch&q=Amber+Heard)\r\n\r\n[YouTube Search](https://www.youtube.com/results?search_query=Amber+Heard)\r\n\r\n[More photos in this subreddit](https://www.reddit.com/r/BeautifulFemales/search?q=Amber+Heard&restrict_sr=on&sort=new&t=all)\r\n\r\nIf you know of more social media info for this person, reply to this comment with the relevant information and it will be added to the bot's database.\r\n\r\n---\r\nThis was posted by a bot. If you discover a problem with it or have suggestions on how to improve it, please [contact LittleMisfit](https://www.reddit.com/message/compose/?to=littlemisfit). 9 Pinned to Amber Heard on Pinterest 8 What did she hear? 8 Amber Heard - The Informers (2008) 7 MFK 7 Amber Heard - London Fields 6 Wow 6 Her name is a complete sentence 6 amber heard lifestyle 5 Source? 5 What did Amber hear? 5 LONDON FIELDS Official Trailer (2018) Amber Heard, Johnny Depp Movie HD 5 Amber 5 MKF 5 Kik: MrBate247 5 What? What did she hear? 4 Name: text, dtype: int64
Notes:
There are a lot of weired textes repeated a lot like:
text = "what did she hear"
df_hear = df[df.text.str.lower().str.contains(text)]
print(df_hear.shape)
with pd.option_context('display.max_colwidth', None):
display(df_hear.head())
(148, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 68 | t1_dslw4u6 | /r/Celebs/comments/7pysyp/amber_heard/dslw4u6/ | What? What did she hear? | t3_7pysyp | r/Celebs | Night_Chicken | 2018-01-13 04:48:41 | Neutral | Neutral | -6.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 119 | t1_dt561qg | /r/goddesses/comments/7sgl66/amber_heard/dt561qg/ | Well..? What did she hear?! | t3_7sgl66 | r/goddesses | HEYL1STEN | 2018-01-24 00:46:00 | Neutral | Neutral | 14.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 357 | t1_duozpap | /r/WtSSTaDaMiT/comments/7zjjab/amber_heard/duozpap/ | What did she hear? | t3_7zjjab | r/WtSSTaDaMiT | PigeonMan45 | 2018-02-23 05:39:55 | Neutral | Neutral | 1.0 | submission | comment | 4 | amber_heard | 2 | [] | 0 |
| 369 | t1_dus4aaq | /r/BeautifulFemales/comments/7zzek8/amber_heard/dus4aaq/ | What did she hear? | t3_7zzek8 | r/BeautifulFemales | predictablePosts | 2018-02-25 00:27:04 | Neutral | Neutral | 1.0 | submission | comment | 4 | amber_heard | 2 | [] | 0 |
| 381 | t1_duwuoob | /r/DCEUboners/comments/80f3q9/amber_heard/duwuoob/ | What did she hear? | t3_80f3q9 | r/DCEUboners | iplayinv3rtd | 2018-02-27 16:41:33 | Neutral | Neutral | 1.0 | submission | comment | 4 | amber_heard | 2 | [] | 0 |
df_hear.subreddit.value_counts()
r/Celebs 131 r/goddesses 6 r/celebnsfw 2 r/sketches 1 r/CelebrityArmpits 1 r/WtSSTaDaMiT 1 r/CelebrityButts 1 r/Celebhub 1 r/BeautifulFemales 1 r/DCEUboners 1 r/PrettyGirls 1 r/CelebrityFeet 1 Name: subreddit, dtype: int64
df_hear.author.value_counts()
Night_Chicken 128 MyKey18 1 murph420 1 VVombatCombat 1 brownsatin 1 christpie 1 Matty_tt 1 PigeonMan45 1 omre16 1 iplayinv3rtd 1 HollowedHunter 1 no_di 1 Augustus420 1 cynicaldotes 1 moemoolah37 1 Sonmeisterbank 1 HEYL1STEN 1 ArtemisSkrivey 1 ImJumentous 1 predictablePosts 1 empifer 1 Name: author, dtype: int64
df_hear[df_hear.author == 'Night_Chicken'].submission_comment.value_counts()
comment 128 Name: submission_comment, dtype: int64
df_hear[df_hear.author == 'Night_Chicken'].subreddit.value_counts()
r/Celebs 128 Name: subreddit, dtype: int64
128 contribution from 148 from only one user Night_Chicken, all of them are comments, in one subreddit r/Celebs
df_hear_contributions = df_hear.groupby(df_hear.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_hear_contributions,
x='created_at',
y='n_contributions', title='The number of "What? What did she hear?" contributions in 2018')
fig.update_traces(marker_color='red', marker_line_width=1, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
df_hear_contributions.sort_values('n_contributions', ascending=False).head(5)
| created_at | n_contributions | |
|---|---|---|
| 17 | 2018-08-29 | 103 |
| 22 | 2018-09-24 | 4 |
| 28 | 2018-10-14 | 4 |
| 6 | 2018-03-13 | 2 |
| 15 | 2018-08-08 | 2 |
df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].author.value_counts()
Night_Chicken 103 Name: author, dtype: int64
df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].submission_comment.value_counts()
comment 103 Name: submission_comment, dtype: int64
df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].subreddit.value_counts()
r/Celebs 103 Name: subreddit, dtype: int64
max_t = df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].created_at.dt.time.max()
max_t
datetime.time(4, 8, 48)
min_t = df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].created_at.dt.time.min()
min_t
datetime.time(3, 41, 58)
from datetime import datetime, date
datetime.combine(date.today(), max_t) - datetime.combine(date.today(), min_t)
datetime.timedelta(seconds=1610)
1610 / 60
26.833333333333332
103 contributions, all of them are comments, in with the same text in one day 29-08-2018 by one user Night_Chicken in one subreddit r/Celebs in only 27 minutes.
text = "[aquaman] amber heard"
df_aqua = df[df.text.str.lower().str.contains(text)]
print(df_aqua.shape)
with pd.option_context('display.max_colwidth', None):
display(df_aqua.head())
(30, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 107 | t3_7rq9ap | /r/dirtypenpals/comments/7rq9ap/22f4a_anyone_with_a_fucked_up_mind_have_a_crush/ | 22f4a - Anyone with a fucked up mind have a crush on Amber Heard, Lexi Belle, Lauren Cohan, Shay Mitchell? | NaN | r/dirtypenpals | -banned- | 2018-01-20 13:02:08 | Negative | Negative | 1.0 | NaN | submission | 20 | 22f4a_anyone_with_a_fucked_up_mind_have_a_crush | 10 | [] | 0 |
| 153 | t3_7ur89i | /r/REPORT_ITALIANO/comments/7ur89i/il_nerd_elon_musk_e_la_modella_amber_heard_si/ | Il nerd Elon Musk e la modella Amber Heard si sono lasciati (ancora) | NaN | r/REPORT_ITALIANO | report_italiano | 2018-02-02 13:19:18 | Neutral | Negative | 1.0 | NaN | submission | 13 | REPORT_ITALIANO | 2 | [] | 0 |
| 229 | t1_dtuxiz0 | /r/moviescirclejerk/comments/7vt6wh/amber_heard/dtuxiz0/ | >I’m not referring to Amber Heard not having the acting chops to play Mera (although she’s not exactly the most memorable actress out there), I’m more so referring to the fact that I honestly don’t think Heard is a very good person. It’s pretty ironic to me that people are outraged over Johnny Depp playing Grindelwald, when I honestly think that if any actor in a 2018 WB blockbuster deserves outrage, it’s Amber Heard.\n\n>I’m going to go ahead and say that I’m convinced at this point that she has been lying about being abused by Johnny Depp. I know that not believing the accuser often comes across as a bad thing to do, but there is evidence to suggest that she has been trying to frame Depp. Not only did the police apparently find zero evidence to suggest that Depp assaulted her, but the supposed video that Heard presented to “prove” that she was assaulted didn’t even show him doing anything of the sort. I’ve watched the video several times, and while there’s no denying that Depp is angry in it, none of it looks like it’s directed towards Heard. At no point in the video is he shown being physically or verbally abusive to her, nor does she seem like she’s scared of him in the slightest. This would of course explain why Heard hasn’t presented that “evidence” to court.\n\n>Furthermore, Heard demanded money from Depp from this whole ordeal. I know that people respond to abuse differently, but if she is truly someone who wanted out of an abusive relationship because she feared for her life, why exactly was her first instinct to try and get a piece of Johnny Depp’s money? This is not even getting into the fact that between the two of them, the only one who’s actually been arrested for domestic violence has been Amber Heard herself.\n\n>Frankly, I think that Amber Heard is a very shady, self-centered, and possibly narcissistic individual who only married Johnny Depp because she wanted his money, and wanted to enjoy the publicity of being someone who was “abused” by an A-lister.\n\n>With all this in mind, I really don’t like that she’s going to be a major character in Aquaman. I’ll still see the movie regardless, but I really don’t like that it has Amber Heard playing the protagonist’s love interest. | t3_7vt6wh | r/moviescirclejerk | Baramos_ | 2018-02-07 03:23:41 | Positive | Neutral | 9.0 | submission | comment | 394 | amber_heard | 2 | [] | 0 |
| 247 | t1_dtwk5f6 | /r/DC_Cinematic/comments/7vr5ia/discussion_amber_heard_playing_mera_rubs_me_the/dtwk5f6/ | It kind of seemed like he was minding his own business at first. Also, according to reports, Heard was apparently “egging him on” and heavily edited the video: http://www.tmz.com/2016/08/12/johnny-depp-amber-heard-throws-wine-glass-domestic-violence-video/. \n\nAlso, need I once again point out that between the two of them, the only one who’s been arrested for domestic violence has been Amber Heard herself? | t1_dtwi26a | r/DC_Cinematic | ZorakLocust | 2018-02-07 23:44:55 | Positive | Negative | 2.0 | comment | comment | 56 | DC_Cinematic | 2 | ['http://www.tmz.com'] | 1 |
| 253 | t1_dtwzyeo | /r/DC_Cinematic/comments/7vr5ia/discussion_amber_heard_playing_mera_rubs_me_the/dtwzyeo/ | Ben Affleck’s received plenty of backlash for his actions. I don’t see much of anyone bringing up what a lousy person Amber Heard is though. | t1_dtwut2v | r/DC_Cinematic | ZorakLocust | 2018-02-08 04:39:04 | Negative | Negative | 2.0 | comment | comment | 25 | DC_Cinematic | 2 | [] | 0 |
df_aqua.subreddit.value_counts()
r/DC_Cinematic 5 r/news 3 r/newstweetfeed 2 r/worldnews 2 r/Elon_musketeers 2 r/movies 2 r/AutoNewspaper 1 r/Amber_Heard 1 r/dirtypenpals 1 r/EnoughMuskSpam 1 r/morganwade 1 r/pickoneceleb 1 r/goddesses 1 r/FOXauto 1 r/removalbot 1 r/Spillthetea 1 r/entertainment 1 r/TwoXChromosomes 1 r/REPORT_ITALIANO 1 r/moviescirclejerk 1 Name: subreddit, dtype: int64
df_aqua.author.value_counts()
-banned- 5 BSRussell 2 ZorakLocust 2 jeff98379 2 iDevice_Help 2 trendynewsupdate 1 PHANTOMCREEPER 1 report_italiano 1 OwlWayneOwlwards 1 illegitimatemexican 1 Chronos2016 1 KrishAndChips 1 Nuggetry 1 Baramos_ 1 BrkntKlc 1 morganwade 1 removalbot 1 viralreportnow 1 AutoNewspaperAdmin 1 AutoNewsAdmin 1 nomnomnomhangry 1 worldwide__master 1 Name: author, dtype: int64
text = "Amber Heard - The Informers".lower()
df_inf = df[df.text.str.lower().str.contains(text)]
print(df_inf.shape)
with pd.option_context('display.max_colwidth', None):
display(df_inf.head())
(27, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 450 | t3_82xah3 | /r/WatchItForThePlot/comments/82xah3/amber_heard_the_informers_2008/ | Amber Heard - The Informers (2008) | NaN | r/WatchItForThePlot | -banned- | 2018-03-08 12:58:59 | Neutral | Neutral | 264.0 | NaN | submission | 6 | amber_heard_the_informers_2008 | 5 | [] | 0 |
| 1264 | t3_8hx1bz | /r/WatchItForThePlot/comments/8hx1bz/amber_heard_the_informers/ | Amber Heard - The Informers | NaN | r/WatchItForThePlot | -banned- | 2018-05-08 14:18:26 | Neutral | Neutral | 3749.0 | NaN | submission | 5 | amber_heard_the_informers | 4 | [] | 0 |
| 1282 | t3_8hzcyg | /r/CelebsGW/comments/8hzcyg/amber_heard_the_informers/ | Amber Heard - The Informers | NaN | r/CelebsGW | CelebsGW | 2018-05-08 19:06:20 | Neutral | Neutral | 91.0 | NaN | submission | 5 | amber_heard_the_informers | 4 | [] | 0 |
| 2225 | t3_8vslyy | /r/WatchItForThePlot/comments/8vslyy/amber_heard_the_informers_2008/ | Amber Heard - The Informers (2008) | NaN | r/WatchItForThePlot | -banned- | 2018-07-03 14:26:24 | Neutral | Neutral | 279.0 | NaN | submission | 6 | amber_heard_the_informers_2008 | 5 | [] | 0 |
| 2825 | t3_91crfc | /r/WatchItForThePlot/comments/91crfc/amber_heard_the_informers/ | Amber Heard - The Informers | NaN | r/WatchItForThePlot | Ezio9619 | 2018-07-24 00:51:55 | Neutral | Neutral | 459.0 | NaN | submission | 5 | amber_heard_the_informers | 4 | [] | 0 |
df_inf.subreddit.value_counts()
r/WatchItForThePlot 8 r/celebnsfw 3 r/celebsnaked 3 r/CelebSexScenes 2 r/Celebsnudess 2 r/Amber_Heard 2 r/PopCultureGifs 2 r/CelebsGW 1 r/nsfwcelebgifs 1 r/adultgifs 1 u/pccpux 1 r/Celebhub 1 Name: subreddit, dtype: int64
df_inf.author.value_counts()
-banned- 9 Ezio9619 8 vonmark955 4 vonjobi951 2 GRJR721 2 CelebsGW 1 vonjobi956 1 Name: author, dtype: int64
df_inf[df_inf.author == 'Ezio9619'].subreddit.value_counts()
r/celebsnaked 3 r/WatchItForThePlot 3 r/celebnsfw 1 r/Celebhub 1 Name: subreddit, dtype: int64
df_inf[df_inf.author == 'Ezio9619'].created_at.dt.date.value_counts()
2018-07-24 4 2018-08-14 4 Name: created_at, dtype: int64
text = "Amber Heard".lower()
df_ah = df[df.text.str.lower().str.contains(text)]
print(df_ah.shape)
with pd.option_context('display.max_colwidth', None):
display(df_ah.head())
(2213, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 | t3_7nkbt3 | /r/gentlemanboners/comments/7nkbt3/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | ZadocPaet | 2018-01-02 05:07:43 | Neutral | Neutral | 5.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 11 | t3_7nkbua | /r/DCEUboners/comments/7nkbua/amber_heard/ | Amber Heard | NaN | r/DCEUboners | ZadocPaet | 2018-01-02 05:07:55 | Neutral | Neutral | 45.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 16 | t3_7nolyx | /r/gentlemanboners/comments/7nolyx/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | -banned- | 2018-01-02 19:17:12 | Neutral | Neutral | 330.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 20 | t3_7nuh77 | /r/gentlemanboners/comments/7nuh77/lovely_duo_amber_heard_jessica_alba/ | Lovely Duo. Amber Heard & Jessica Alba | NaN | r/gentlemanboners | -banned- | 2018-01-03 12:58:23 | Positive | Positive | 570.0 | NaN | submission | 7 | lovely_duo_amber_heard_jessica_alba | 6 | [] | 0 |
| 24 | t3_7nxwrq | /r/gentlemanboners/comments/7nxwrq/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | -banned- | 2018-01-03 21:34:14 | Neutral | Neutral | 2.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
df_ah.subreddit.value_counts()
r/Celebs 350
r/gentlemanboners 296
r/DCEUboners 87
r/DC_Cinematic 85
r/goddesses 79
...
r/jessicaalba 1
r/OnOff 1
r/PickHerOutfit 1
r/JerkOffToDesiCelebs 1
r/celebdominatrixs 1
Name: subreddit, Length: 301, dtype: int64
df_ah.author.value_counts()
-banned- 829
AutoNewsAdmin 35
emilyguy 34
AutoNewspaperAdmin 33
ccrraapp 27
...
sonofcross 1
filmitalkies 1
JoLuGaLo 1
s0mnambulance 1
Clip_Dirtblade 1
Name: author, Length: 605, dtype: int64
df_ah[df_ah.author == 'AutoNewsAdmin'].subreddit.value_counts()
r/USATODAYauto 9 r/REUTERSauto 4 r/ABCauto 4 r/FOXauto 3 r/LATIMESauto 2 r/MIAMIHERALDauto 2 r/RTauto 1 r/SCMPauto 1 r/WAPOauto 1 r/NYTauto 1 r/INDEPENDENTauto 1 r/NEWSDAYauto 1 r/TWTauto 1 r/CBSauto 1 r/HOUSTONCHRONauto 1 r/BBCauto 1 r/NZHauto 1 Name: subreddit, dtype: int64
df_ah[df_ah.author == 'AutoNewsAdmin'].created_at.dt.date.value_counts()
2018-10-25 5 2018-10-22 4 2018-07-03 3 2018-12-06 2 2018-10-03 2 2018-04-05 2 2018-12-05 2 2018-12-19 2 2018-11-26 2 2018-06-25 1 2018-10-29 1 2018-09-14 1 2018-10-10 1 2018-12-03 1 2018-10-04 1 2018-11-27 1 2018-05-15 1 2018-12-20 1 2018-11-13 1 2018-12-12 1 Name: created_at, dtype: int64
df_ah[df_ah.author == 'emilyguy'].subreddit.value_counts()
r/gentlemanboners 19 r/Celebs 7 r/celebsnaked 4 r/celebnsfw 2 r/WatchItForThePlot 1 r/CaraDelevingne 1 Name: subreddit, dtype: int64
df_ah[df_ah.author == 'emilyguy'].created_at.dt.date.value_counts()
2018-03-22 3 2018-05-05 3 2018-10-05 2 2018-11-15 1 2018-11-01 1 2018-03-08 1 2018-06-11 1 2018-09-25 1 2018-12-22 1 2018-03-13 1 2018-05-16 1 2018-12-16 1 2018-05-01 1 2018-09-17 1 2018-05-25 1 2018-12-29 1 2018-08-26 1 2018-03-19 1 2018-07-04 1 2018-07-05 1 2018-03-24 1 2018-08-18 1 2018-07-23 1 2018-07-03 1 2018-06-28 1 2018-07-12 1 2018-04-13 1 2018-06-19 1 2018-10-17 1 Name: created_at, dtype: int64
text = "fuck amber heard"
df_fuc2 = df[df.text.str.lower().str.contains(text)]
print(df_fuc2.shape)
with pd.option_context('display.max_colwidth', None):
display(df_fuc2.head())
(4, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1726 | t1_e0830mk | /r/movies/comments/8ozhus/first_poster_london_fields_amber_heard_billy_bob/e0830mk/ | Fuck Amber Heard, she's a lier and a gold digger. | t3_8ozhus | r/movies | -banned- | 2018-06-06 18:59:57 | Negative | Negative | -6.0 | submission | comment | 10 | first_poster_london_fields_amber_heard_billy_bob | 8 | [] | 0 |
| 1776 | t1_e091jem | /r/movies/comments/8p6bah/london_fields_official_trailer_2018_amber_heard/e091jem/ | Fuck Amber Heard! | t3_8p6bah | r/movies | -banned- | 2018-06-07 04:15:22 | Negative | Negative | 1.0 | submission | comment | 3 | london_fields_official_trailer_2018_amber_heard | 7 | [] | 0 |
| 4287 | t1_e79113k | /r/goddesses/comments/9ljspc/amber_heard/e79113k/ | Fuck Amber Heard | t3_9ljspc | r/goddesses | -banned- | 2018-10-06 01:25:58 | Negative | Negative | 3.0 | submission | comment | 3 | amber_heard | 2 | [] | 0 |
| 4293 | t1_e7dgj5g | /r/goddesses/comments/9maij0/amber_heard/e7dgj5g/ | Fuck Amber Heard | t3_9maij0 | r/goddesses | -banned- | 2018-10-08 03:33:40 | Negative | Negative | 0.0 | submission | comment | 3 | amber_heard | 2 | [] | 0 |
df_fuc2.author.value_counts()
-banned- 4 Name: author, dtype: int64
df_fuc2.subreddit.value_counts()
r/goddesses 2 r/movies 2 Name: subreddit, dtype: int64
df_fuc2.submission_comment.value_counts()
comment 4 Name: submission_comment, dtype: int64
df_nc = df.query(" author == 'Night_Chicken'")
df_nc = df_nc.sort_values('created_at')
print(df_nc.shape)
df_nc.head(2)
(135, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 68 | t1_dslw4u6 | /r/Celebs/comments/7pysyp/amber_heard/dslw4u6/ | What? What did she hear? | t3_7pysyp | r/Celebs | Night_Chicken | 2018-01-13 04:48:41 | Neutral | Neutral | -6.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 659 | t1_dvufzj9 | /r/Celebs/comments/844rw7/amber_heard/dvufzj9/ | What? Thank you! | t1_dvn0nfy | r/Celebs | Night_Chicken | 2018-03-17 13:22:00 | Neutral | Positive | 1.0 | comment | comment | 3 | amber_heard | 2 | [] | 0 |
df_nc.text.value_counts()
What? What did she hear? 121 What? What did she hear? 4 I want to know as well. 1 What? Thank you! 1 Excellent! Not as exciting as the crunching and jostling sounds in Elon's full pockets which escaped her grasp. 1 Exactly.\n\n​ 1 What? What did she hear? 1 What? What did she hear? 1 Good question.\n\n​ 1 What? What did she hear? Oh. The Birthday song. 1 Yes. I want to know. 1 Yes! What? 1 Name: text, dtype: int64
Check the number of text words
Of course few words are easier for bots to create
df['text_words'].value_counts().head(10);
px.histogram(df['text_words'].to_frame(), x="text_words",title='number of words in each contribution',
nbins=200).update_traces(marker_color='#5296dd')
The number of parent comments on submissions¶
px.bar(data_frame=df['top_level'].value_counts().to_frame().reset_index(),
x="index", y="top_level").update_layout(title='Comment or Submission (Top Level of Contrbution "Parent")',
xaxis_title='contribution top level (parent) category',
yaxis_title='number of contributions').update_traces(marker_color='#5296dd')
Investigating the Submission Text
(Submissions with the most comments and replies)
We can get the number of different submissions by looking only at the submissions dataframe
Also we can look at submission_text with the most interactions (repeated submission_text)
df['submission_text'].value_counts().head(20)
amber_heard 2490 DC_Cinematic 760 amber_heard_received_death_threats_and_was 254 amber_heard_the_informers 167 just_a_friendly_reminder_that_domestic_abuse_is 102 amber_heard_fans_upset_after_she_posted_a_racist 96 johnny_depp_claims_ex_amber_heard_punched_him 95 amber_heard_mirin_jason_momoa 77 johnny_depp_accuses_amber_heard_of_shitting_in 76 amber_heard_london_fields_2018 71 first_poster_london_fields_amber_heard_billy_bob 69 aquaman_actress_amber_heard_gets_called_racist 65 aquaman_amber_heard 64 new_aquaman_images_shows_jason_momoa_and_amber 63 amber_heard_film_london_fields_suffers_one_of_the 61 amber_heard_has_an_amazing_body 60 2018_comiccon_red_carpet_gal_gadot_melissa 51 amber_heard_stunning_in_red 47 personally_i_think_it_takes_a_lot_of_skill_to 44 amber_heard_in_the_informers 43 Name: submission_text, dtype: int64
df_amber = df.query(" submission_text == 'amber_heard' & \
submission_comment == 'submission' ")
print(df_amber.shape)
with pd.option_context('display.max_colwidth', None):
display(df_amber.head())
(1044, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 | t3_7nkbt3 | /r/gentlemanboners/comments/7nkbt3/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | ZadocPaet | 2018-01-02 05:07:43 | Neutral | Neutral | 5.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 11 | t3_7nkbua | /r/DCEUboners/comments/7nkbua/amber_heard/ | Amber Heard | NaN | r/DCEUboners | ZadocPaet | 2018-01-02 05:07:55 | Neutral | Neutral | 45.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 16 | t3_7nolyx | /r/gentlemanboners/comments/7nolyx/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | -banned- | 2018-01-02 19:17:12 | Neutral | Neutral | 330.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 24 | t3_7nxwrq | /r/gentlemanboners/comments/7nxwrq/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | -banned- | 2018-01-03 21:34:14 | Neutral | Neutral | 2.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 29 | t3_7o2ahb | /r/Celebs/comments/7o2ahb/amber_heard/ | Amber Heard | NaN | r/Celebs | -banned- | 2018-01-04 11:03:29 | Neutral | Neutral | 55.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
1044 Different Submissions
li = list(df_amber.author.unique())
li.remove('-banned-')
df_users[df_users.user_name.isin(li)]
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 29 | RyanSmith | True | True | False | False | 263666.0 | 5989322.0 | 2006-08-07 14:18:44 | others | others |
| 230 | soundsoul | True | True | False | False | 4156.0 | 238286.0 | 2008-01-04 02:52:49 | others | others |
| 1488 | littlemisfit | True | True | False | False | 9523.0 | 602311.0 | 2010-09-17 22:11:14 | others | others |
| 1520 | FlexOutlaw | True | False | False | False | 5217.0 | 154864.0 | 2010-09-25 18:36:46 | others | others |
| 2974 | jarakacha | True | True | False | False | 142.0 | 153129.0 | 2011-07-20 21:10:48 | others | others |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 66206 | jasontheblogger2018 | True | True | True | True | NaN | NaN | NaT | banned | banned |
| 66258 | Snappleman87 | True | True | True | True | NaN | NaN | NaT | banned | banned |
| 66342 | 88MPH1 | True | True | True | True | NaN | NaN | NaT | banned | banned |
| 68878 | armpit-lover | True | True | True | True | NaN | NaN | NaT | banned | banned |
| 70281 | HellsJuggernaut | True | True | True | True | NaN | NaN | NaT | banned | banned |
197 rows × 10 columns
df_mera_comments = df.query(" submission_text == 'amber_heard' ")
print(df_mera_comments.shape)
df_mera_comments.head(1)
(2490, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | t1_ds0ylh7 | /r/Celebs/comments/7naj20/amber_heard/ds0ylh7/ | The arm pit looks like a badly shaved vag! | t3_7naj20 | r/Celebs | vastio67 | 2018-01-01 04:44:17 | Negative | Neutral | 0.0 | submission | comment | 9 | amber_heard | 2 | [] | 0 |
df_mera_contributions = df_mera_comments.groupby(df_mera_comments.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_mera_contributions.head(7),
x='created_at',
y='n_contributions', title='The number of contributions/date on these submissions')
fig.update_layout(
xaxis = dict(
title='Contribution Date',
tickmode = 'array',
tickvals = df_mera_contributions.head(7).created_at,
)
)
clrs = ['red' if (y > 200) else '#5296dd' for y in df_mera_contributions.n_contributions]
fig.update_traces(marker_color=clrs, marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
df_mera_authors = df_mera_comments.groupby(df_mera_comments.author).size().reset_index(name='n_contributions')
fig = px.bar(df_mera_authors,
x='author',
y='n_contributions', title='The number of contributions per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto')
# , marker_line_color='#5296dd', marker_line_width=2
fig.update_yaxes(range = [0,25])
fig.show()
df_mera_comments[df_mera_comments.author == 'Night_Chicken']
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 68 | t1_dslw4u6 | /r/Celebs/comments/7pysyp/amber_heard/dslw4u6/ | What? What did she hear? | t3_7pysyp | r/Celebs | Night_Chicken | 2018-01-13 04:48:41 | Neutral | Neutral | -6.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 659 | t1_dvufzj9 | /r/Celebs/comments/844rw7/amber_heard/dvufzj9/ | What? Thank you! | t1_dvn0nfy | r/Celebs | Night_Chicken | 2018-03-17 13:22:00 | Neutral | Positive | 1.0 | comment | comment | 3 | amber_heard | 2 | [] | 0 |
| 660 | t1_dvug20l | /r/Celebs/comments/81y6lx/amber_heard/dvug20l/ | What? What did she hear? | t3_81y6lx | r/Celebs | Night_Chicken | 2018-03-17 13:23:50 | Neutral | Neutral | 1.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 3297 | t1_e3uftxn | /r/Celebs/comments/94w2gm/amber_heard/e3uftxn/ | What? What did she hear? | t3_94w2gm | r/Celebs | Night_Chicken | 2018-08-08 20:07:11 | Neutral | Neutral | 1.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 3298 | t1_e3ufvla | /r/Celebs/comments/94tmn2/amber_heard/e3ufvla/ | What? What did she hear? | t3_94tmn2 | r/Celebs | Night_Chicken | 2018-08-08 20:07:50 | Neutral | Neutral | 1.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4446 | t1_e834yvb | /r/Celebs/comments/9pcp29/amber_heard/e834yvb/ | What? What did she hear? | t3_9pcp29 | r/Celebs | Night_Chicken | 2018-10-19 21:47:03 | Neutral | Neutral | 0.0 | submission | comment | 5 | amber_heard | 2 | [] | 0 |
| 4644 | t1_e8vhnfn | /r/Celebs/comments/9t9agd/amber_heard/e8vhnfn/ | Yes. I want to know. | t1_e8utt9m | r/Celebs | Night_Chicken | 2018-11-01 21:28:46 | Neutral | Positive | 1.0 | comment | comment | 5 | amber_heard | 2 | [] | 0 |
| 4931 | t1_e9jg966 | /r/Celebs/comments/9vyy48/amber_heard/e9jg966/ | I want to know as well. | t1_e9h3ijf | r/Celebs | Night_Chicken | 2018-11-12 06:36:51 | Neutral | Positive | 1.0 | comment | comment | 6 | amber_heard | 2 | [] | 0 |
| 5371 | t1_eatt7s2 | /r/Celebs/comments/a1u9h8/amber_heard/eatt7s2/ | Good question.\n\n​ | t1_easxyfi | r/Celebs | Night_Chicken | 2018-12-01 01:52:52 | Positive | Positive | 0.0 | comment | comment | 3 | amber_heard | 2 | [] | 0 |
| 5597 | t1_ebfi2pr | /r/Celebs/comments/a4k0m5/amber_heard/ebfi2pr/ | Excellent! Not as exciting as the crunching a... | t1_ebfcml3 | r/Celebs | Night_Chicken | 2018-12-09 15:05:57 | Positive | Neutral | 1.0 | comment | comment | 18 | amber_heard | 2 | [] | 0 |
122 rows × 17 columns
df_mera_comments[df_mera_comments.author == 'emilyguy'].shape
(34, 17)
df_dc = df.query(" submission_text == 'DC_Cinematic' & \
submission_comment == 'submission' ")
print(df_dc.shape)
with pd.option_context('display.max_colwidth', None):
display(df_dc.head())
(47, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 190 | t3_7vr5ia | /r/DC_Cinematic/comments/7vr5ia/discussion_amber_heard_playing_mera_rubs_me_the/ | DISCUSSION: Amber Heard playing Mera rubs me the wrong way | NaN | r/DC_Cinematic | ZorakLocust | 2018-02-06 22:16:01 | Negative | Neutral | 0.0 | NaN | submission | 10 | DC_Cinematic | 2 | [] | 0 |
| 479 | t3_82z9f0 | /r/DC_Cinematic/comments/82z9f0/rumor_aquaman_press_tour_with_amber_heard_and/ | RUMOR: Aquaman press tour with Amber Heard and Jason Momoa is starting SOON in Europe! | NaN | r/DC_Cinematic | Sabya2kMukherjee | 2018-03-08 17:30:32 | Neutral | Neutral | 125.0 | NaN | submission | 15 | DC_Cinematic | 2 | [] | 0 |
| 859 | t3_89jugi | /r/DC_Cinematic/comments/89jugi/other_amber_heard_visits_children_in_syrian/ | OTHER: Amber Heard visits children in Syrian refugee camp of Zaatari | NaN | r/DC_Cinematic | -banned- | 2018-04-03 23:21:48 | Negative | Neutral | 1.0 | NaN | submission | 11 | DC_Cinematic | 2 | [] | 0 |
| 887 | t3_8bbokl | /r/DC_Cinematic/comments/8bbokl/news_heroes_onscreen_and_off_amber_heard_donates/ | NEWS: Heroes onscreen and off: Amber Heard donates to children's hospital | NaN | r/DC_Cinematic | -banned- | 2018-04-10 21:40:38 | Neutral | Positive | 81.0 | NaN | submission | 11 | DC_Cinematic | 2 | [] | 0 |
| 925 | t3_8c2e0d | /r/DC_Cinematic/comments/8c2e0d/social_media_amber_heard_aquafied_yet_again/ | Social Media: Amber Heard: Aquafied yet again | NaN | r/DC_Cinematic | Mohamed_Todd | 2018-04-13 20:33:56 | Positive | Neutral | 163.0 | NaN | submission | 7 | DC_Cinematic | 2 | [] | 0 |
47 Different Submissions
li = list(df_dc.author.unique())
df_users[df_users.user_name.isin(li)]
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2981 | indian22 | True | False | False | False | 6300.0 | 69586.0 | 2011-07-21 10:57:48 | others | others |
| 3846 | -banned- | True | False | False | False | 77234.0 | 82.0 | 2011-11-06 21:08:20 | others | others |
| 10847 | boumtjeboo | True | False | False | False | 89482.0 | 116498.0 | 2013-08-13 21:48:05 | others | others |
| 11664 | GlowInThe | True | False | False | False | 27992.0 | 87943.0 | 2013-10-29 06:50:39 | others | others |
| 13934 | Rugby11 | True | True | False | False | 2397.0 | 114562.0 | 2014-05-19 17:41:26 | others | others |
| 19407 | Richardrumeo | True | False | False | False | 14716.0 | 40662.0 | 2015-08-18 18:01:48 | others | others |
| 20990 | Mohamed_Todd | True | False | False | False | 7017.0 | 28735.0 | 2016-01-01 21:31:20 | others | others |
| 21016 | BeenTryin | True | False | False | False | 29570.0 | 313003.0 | 2016-01-03 17:29:07 | others | others |
| 21386 | ZorakLocust | True | False | False | False | 29289.0 | 10356.0 | 2016-01-30 19:10:14 | others | others |
| 24215 | Sabya2kMukherjee | True | False | False | False | 55889.0 | 37522.0 | 2016-07-31 18:15:40 | others | others |
| 25133 | GamesFictionFan | True | False | False | False | 16722.0 | 9220.0 | 2016-09-22 21:50:48 | others | others |
| 27036 | RobustBender | True | False | False | False | 54570.0 | 3062.0 | 2017-01-06 03:14:00 | others | others |
| 30721 | AldebaranTauro | True | False | False | False | 4626.0 | 376091.0 | 2017-07-19 01:17:55 | others | others |
| 32853 | atulsachdeva | True | True | False | False | 17528.0 | 4660.0 | 2017-11-12 23:06:55 | others | others |
| 34463 | Kal_sai | True | True | False | False | 34327.0 | 41936.0 | 2018-01-27 13:08:45 | others | 2018 |
| 37226 | Staticrealms | True | False | False | False | 261.0 | 4027.0 | 2018-05-29 14:57:52 | others | 2018 |
| 38850 | kyotomafia1997 | True | False | False | False | 1642.0 | 5462.0 | 2018-08-01 04:30:18 | others | 2018 |
| 41777 | ThatBrandingGuy | True | False | False | False | 7689.0 | 29968.0 | 2018-11-11 19:17:29 | others | 2018 |
| 66041 | Hakim36 | True | True | True | True | NaN | NaN | NaT | banned | banned |
df_dc_comments = df.query(" submission_text == 'DC_Cinematic' & \
submission_comment == 'comment' ")
print(df_dc_comments.shape)
df_dc_comments.head(1)
(713, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 191 | t1_dtug451 | /r/DC_Cinematic/comments/7vr5ia/discussion_amb... | She's hot af though lol | t3_7vr5ia | r/DC_Cinematic | Speekeazyyyy | 2018-02-06 22:21:06 | Positive | Positive | 23.0 | submission | comment | 5 | DC_Cinematic | 2 | [] | 0 |
df_dc_comments = df_dc_comments.groupby(df_dc_comments.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_dc_comments.head(4),
x='created_at',
y='n_contributions', title='The number of contributions/date on these submissions')
fig.update_layout(
xaxis = dict(
title='Contribution Date',
tickmode = 'array',
tickvals = df_dc_comments.head(4).created_at,
)
)
clrs = ['red' if (y > 200) else '#5296dd' for y in df_dc_comments.n_contributions]
fig.update_traces(marker_color=clrs, marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
df_dc_comments = df.query(" submission_text == 'DC_Cinematic' & \
submission_comment == 'comment' ")
df_dc_authors = df_dc_comments.groupby(df_dc_comments.author).size().reset_index(name='n_contributions')
df_dc_authors.sort_values('n_contributions', ascending=False).head(10)
| author | n_contributions | |
|---|---|---|
| 0 | -banned- | 122 |
| 184 | ZorakLocust | 16 |
| 31 | Chronos2016 | 15 |
| 115 | NaveHarder | 14 |
| 14 | AutoModerator | 12 |
| 93 | KAIZOKUGARI23 | 12 |
| 270 | serjon_arryn | 10 |
| 78 | Hydrostorm9 | 9 |
| 22 | BatmanNewsChris | 8 |
| 34 | CliffordMoreau | 8 |
fig = px.bar(df_dc_authors,
x='author',
y='n_contributions', title='The number of contributions per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto')
# , marker_line_color='#5296dd', marker_line_width=2
fig.update_yaxes(range = [0,25])
fig.show()
df_dc_comments[df_dc_comments.author == 'ZorakLocust'].text.head(5).values
array(['Could you perhaps elaborate on what’s so dumb about it? ',
'That’s not what her Wikipedia page says. ',
'Notice that I put “abused” in quotations. It’s because I don’t believe she’s an actual abuse victim. I think she’s a manipulative gold digger who wanted some easy publicity. This is in no way meant to discredit abuse victims but anyone who blindly believes that Heard is one of them is not doing anyone any favors. ',
'I would only be victim blaming if Amber Heard actually was a victim. ',
'I addressed your point about how no one would want to be an abuse victim. My response was that while that’s true, there is evidence to suggest that Amber Heard is not a victim of anything. \n\nAlso, if you want an answer as to why she would wait a year (because that’s how long they were married) before accusing Depp of domestic abuse, I don’t know the answer to that. Still, it seems suspicious as hell that she was apparently demanding money out of him. \n\nAnyway, would you perhaps care to address what I mentioned about the video she posted to “prove” she was abused? Specifically how it doesn’t actually show Johnny Depp abusing her? '],
dtype=object)
df_dc_comments[df_dc_comments.author == 'Chronos2016'].text.head(5).values
array(['Not a big change acting wise but she would look cool as her too. ',
"no it's a wig.",
"I remember being in like 9th grade when Amber Heard came out as bisexual. It was 2008 and it was a pretty major thing for a celebrity to say. That was my first introduction to her.\n\nI really liked her makeup looks and her fashion back then and she's probably one of the celebs who got me into makeup. Her ad campaign for Guess was also iconic and the campaign video is often used in fan videos for leaked Lana Del Rey songs. \n\nShe's one of the more glamorous celebs out there and she super well loved amongst the sad indie girl circles online.",
"I think she played Seth Rogen's gf in Pineapple Express. It was a pretty small and thankless role. ",
"Don't do that. They both put out a joint statement saying that both sides told the truth. Johnny is being petty and going back on that statement. \n\nWe may never know what happened between the two of them, clearly it was a toxic relationship. Amber is trying her best to move on but Johnny still wants to drag this out. He should take a page from Amber and move on himself."],
dtype=object)
df_death = df.query(" submission_text == 'amber_heard_received_death_threats_and_was' & \
submission_comment == 'submission' ")
print(df_death.shape)
with pd.option_context('display.max_colwidth', None):
display(df_death.head())
(1, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6103 | t3_a7lypo | /r/MensRights/comments/a7lypo/amber_heard_received_death_threats_and_was/ | Amber Heard received death threats and was 'blacklisted' after accusing Johnny Depp of abuse | Actress who went public with a false accusation is shocked to discover her lies have negative consequences - for her, not for Depp. | NaN | r/MensRights | EricAllonde | 2018-12-19 12:26:42 | Negative | Negative | 2605.0 | NaN | submission | 38 | amber_heard_received_death_threats_and_was | 7 | [] | 0 |
one Submission
df_death.permalink.values[0]
'/r/MensRights/comments/a7lypo/amber_heard_received_death_threats_and_was/'
li = list(df_death.author.unique())
df_users[df_users.user_name.isin(li)]
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 23311 | EricAllonde | True | False | False | False | 117967.0 | 409733.0 | 2016-06-12 01:05:21 | others | others |
df_death_comments = df.query(" submission_text == 'amber_heard_received_death_threats_and_was' & \
submission_comment == 'comment' ")
print(df_death_comments.shape)
df_death_comments.head(1)
(253, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6106 | t1_ec3z6h3 | /r/MensRights/comments/a7lypo/amber_heard_rece... | This article doesn't indicate whether these al... | t3_a7lypo | r/MensRights | DownvotedByShitters | 2018-12-19 13:15:12 | Positive | Neutral | 413.0 | submission | comment | 416 | amber_heard_received_death_threats_and_was | 7 | ['https://www.theguardian.com'] | 1 |
df_death_comments = df_death_comments.groupby(df_death_comments.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_death_comments.head(4),
x='created_at',
y='n_contributions', title='The number of contributions/date on these submissions')
fig.update_layout(
xaxis = dict(
title='Contribution Date',
tickmode = 'array',
tickvals = df_death_comments.head(4).created_at,
)
)
clrs = ['red' if (y > 200) else '#5296dd' for y in df_death_comments.n_contributions]
fig.update_traces(marker_color=clrs, marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
df_death_comments = df.query(" submission_text == 'amber_heard_received_death_threats_and_was' & \
submission_comment == 'comment' ")
df_death_authors = df_death_comments.groupby(df_death_comments.author).size().reset_index(name='n_contributions')
df_death_authors.sort_values('n_contributions', ascending=False).head(10)
| author | n_contributions | |
|---|---|---|
| 0 | -banned- | 30 |
| 91 | tenchineuro | 22 |
| 67 | j3utton | 13 |
| 37 | Rogdozz | 12 |
| 87 | scyth3s | 10 |
| 20 | LateNightTestPattern | 8 |
| 15 | GreyFox860 | 7 |
| 52 | _pseudodragon | 6 |
| 82 | purpleblossom | 5 |
| 57 | chambertlo | 5 |
fig = px.bar(df_death_authors,
x='author',
y='n_contributions', title='The number of contributions per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto')
# , marker_line_color='#5296dd', marker_line_width=2
fig.update_yaxes(range = [0,25])
fig.show()
df_death_comments[df_death_comments.author == 'tenchineuro'].text.head(5).values
array(["> Explaining that she rarely left her apartment for fear of being hounded, she added: **'I felt as though I was on trial in the court of public opinion** - and my life and livelihood depended on myriad judgments far beyond my control.'\n\nWelcome to the world #metoo created.\n\nI suspect that she's exaggerating the career effects though, apparently it's OK for female actresses to sexually assault or assault their husbands/BFs.",
"> Yes, the burden of proof is on the accuser, but not having proof does not mean her allegations are false.\n\nIs that your default assumption? For how long have you been a feminist?\n\n> However, when someone qualifies those allegations as false, they are now making a new allegation about the initial allegations. The burden of proof - of proving the allegations are actually false, is now on them.\n\nSo if you deny the allegations then the burden is now on you to prove your innocence? You'd make a fine feminist lawyer, or maybe you are already?",
"> The allegations being unsubstantiated and the allegations being provably false are extremely different.\n\nThey don't have to be provably false, it's impossible to prove that something did not happen, they need to be proven true beyond the benefit of a doubt.",
'>> However, when someone qualifies those allegations as false, they are now making a new allegation about the initial allegations. The burden of proof - of proving the allegations are actually false, is now on them.\n\n> I\'m discussing how our criminal justice system is "supposed" to work and the philosophy behind it.\n\nNo, you are not, it is never the legal responsibility of the accused to prove their innocence. \n\nAnd claiming one\'s innocence does not create a burden of proving it.\n\nWhat you are saying is basically feminist jurisprudence.',
'>> They don\'t have to be provably false\n>\n> In order to be called "false allegations" they do.\n\nSo you believe that innocence needs to be proven, or what, you assume guilt? You are a fine advocate for false accusers BTW, you have lots of company on the feminist side of the fence.'],
dtype=object)
df_death_comments[df_death_comments.author == 'tenchineuro'].created_at.dt.date.value_counts()
2018-12-19 22 Name: created_at, dtype: int64
df_death_comments[df_death_comments.author == 'tenchineuro'].subreddit.value_counts()
r/MensRights 22 Name: subreddit, dtype: int64
tenchineuro made 22 comment in one day 19-12-2018 in one subreddit r/MensRights
Invesigating the sumbissions with most comments
(Top Level Comments)
df.parent_id.value_counts().head()
t3_91hqrc 49 t3_a6i1j4 40 t3_a7lypo 36 t3_9uzs60 34 t3_8hx1bz 30 Name: parent_id, dtype: int64
fig = px.bar(df.parent_id.value_counts().to_frame().head(25).reset_index(), x="parent_id", y="index",
height=500,
title='sumbissions with most comments (Top Level Comments)').update_layout(
xaxis_title='Number of comments',
yaxis_title='subbredit').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
df_top5 = df_merged[df_merged.child_id.isin(df_merged.parent_id.value_counts().head().index)]
with pd.option_context('display.max_colwidth', None):
display(df_top5)
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 369 | t3_8hx1bz | /r/WatchItForThePlot/comments/8hx1bz/amber_heard_the_informers/ | Amber Heard - The Informers | NaN | r/WatchItForThePlot | 2018-05-08 14:18:26 | Neutral | Neutral | 3749.0 | NaN | ... | False | False | False | 77234.0 | 82.0 | 2011-11-06 21:08:20 | others | others | 2374 days 17:10:06 | 2374.0 |
| 1134 | t3_9uzs60 | /r/WatchItForThePlot/comments/9uzs60/amber_heard_london_fields_2018/ | Amber Heard - London Fields [2018] | NaN | r/WatchItForThePlot | 2018-11-07 14:17:00 | Neutral | Neutral | 4015.0 | NaN | ... | False | False | False | 77234.0 | 82.0 | 2011-11-06 21:08:20 | others | others | 2557 days 17:08:40 | 2557.0 |
| 1408 | t3_a6i1j4 | /r/DC_Cinematic/comments/a6i1j4/other_amber_heard_is_painfully_beautiful_as_mera/ | OTHER: Amber Heard is painfully beautiful as Mera | NaN | r/DC_Cinematic | 2018-12-15 19:31:20 | Positive | Neutral | 2082.0 | NaN | ... | False | False | False | 77234.0 | 82.0 | 2011-11-06 21:08:20 | others | others | 2595 days 22:23:00 | 2595.0 |
| 4469 | t3_91hqrc | /r/TrueFMK/comments/91hqrc/2018_comiccon_red_carpet_gal_gadot_melissa/ | 2018 Comic-Con Red Carpet: Gal Gadot, Melissa Benoist, Amber Heard | NaN | r/TrueFMK | 2018-07-24 14:13:54 | Neutral | Neutral | 41.0 | NaN | ... | True | False | False | 24502.0 | 92265.0 | 2015-12-05 21:18:19 | others | others | 961 days 16:55:35 | 961.0 |
| 6430 | t3_a7lypo | /r/MensRights/comments/a7lypo/amber_heard_received_death_threats_and_was/ | Amber Heard received death threats and was 'blacklisted' after accusing Johnny Depp of abuse | Actress who went public with a false accusation is shocked to discover her lies have negative consequences - for her, not for Depp. | NaN | r/MensRights | 2018-12-19 12:26:42 | Negative | Negative | 2605.0 | NaN | ... | False | False | False | 117967.0 | 409733.0 | 2016-06-12 01:05:21 | others | others | 920 days 11:21:21 | 920.0 |
5 rows × 24 columns
df_merged.submission_text = df_merged.submission_text.str.replace('_', ' ')
# get a list with the top 5 submission text
top5_text = list(df_top5.text)
def compare(str):
for text in top5_text:
if str in text:
return True
else: return False
mask = df_merged.submission_text.apply(compare)
df_top5_contributions1 = df_merged[mask]
df_top5_authors = df_top5_contributions1.groupby(df_top5_contributions1.user_name).size().reset_index(name='n_contributions')
fig = px.bar(df_top5_authors,
x='user_name',
y='n_contributions', title='The number comments per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto')
fig.show()
df_top5_contributions1.shape
(33, 24)
df_top5_contributions2 = df_merged[df_merged.parent_id.isin(df_merged.parent_id.value_counts().head().index)]
df_top5_authors = df_top5_contributions2.groupby(df_top5_contributions2.user_name).size().reset_index(name='n_contributions')
fig = px.bar(df_top5_authors,
x='user_name',
y='n_contributions', title='The number of parent comments per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto')
# , marker_line_color='#5296dd', marker_line_width=2
fig.update_yaxes(range = [0,5])
fig.show()
NOTE: There are 32 parent comments from banned accounts
df_top5_contributions2.shape
(189, 24)
Invesigating authors with the most submissions¶
df_submissions = df[df.submission_comment == 'submission']
print(df_submissions.shape)
df_submissions.head(2)
(2000, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 | t3_7nkbt3 | /r/gentlemanboners/comments/7nkbt3/amber_heard/ | Amber Heard | NaN | r/gentlemanboners | ZadocPaet | 2018-01-02 05:07:43 | Neutral | Neutral | 5.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
| 11 | t3_7nkbua | /r/DCEUboners/comments/7nkbua/amber_heard/ | Amber Heard | NaN | r/DCEUboners | ZadocPaet | 2018-01-02 05:07:55 | Neutral | Neutral | 45.0 | NaN | submission | 2 | amber_heard | 2 | [] | 0 |
df_submissions.author.value_counts().nlargest(n=10)
-banned- 794 AutoNewsAdmin 35 emilyguy 34 AutoNewspaperAdmin 33 ccrraapp 27 Rednaxela117 25 jeff98379 21 vonmark955 20 InfiniTitans 18 ZadocPaet 17 Name: author, dtype: int64
df_submissions.author.value_counts().to_frame().head(10)
| author | |
|---|---|
| -banned- | 794 |
| AutoNewsAdmin | 35 |
| emilyguy | 34 |
| AutoNewspaperAdmin | 33 |
| ccrraapp | 27 |
| Rednaxela117 | 25 |
| jeff98379 | 21 |
| vonmark955 | 20 |
| InfiniTitans | 18 |
| ZadocPaet | 17 |
fig = px.bar(df_submissions.author.value_counts().to_frame().head(10).reset_index(), x="author", y="index",
height=500,
title='Authors with most Submissions').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='Number of Negative Submissions',
yaxis_title='Author_Name').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
Check wether the users with the most submissions are mod, gold or having a verified email¶
df_submissions.author.value_counts().head()
-banned- 794 AutoNewsAdmin 35 emilyguy 34 AutoNewspaperAdmin 33 ccrraapp 27 Name: author, dtype: int64
check_list = df_submissions.author.value_counts().nlargest(n=25).index.tolist()[1:]
check_list
['AutoNewsAdmin', 'emilyguy', 'AutoNewspaperAdmin', 'ccrraapp', 'Rednaxela117', 'jeff98379', 'vonmark955', 'InfiniTitans', 'ZadocPaet', 'MightUlt-7', 'FlexOutlaw', 'Queen1110', 'AngelaStettner69', 'Pm-me-your-ass-photo', 'GRJR721', 'naughtytwd', 'vonjobi951', 'Ezio9619', 'horny_fuckers', 'Luke5to1', 'iDevice_Help', 'pitsnbush', 'windowmedia', 'sagar7854']
# get a data frame with the most negative-comments users
df_check = df_users[df_users['user_name'].isin(check_list)]
print(df_check.shape)
df_check.head(2)
(24, 10)
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1520 | FlexOutlaw | True | False | False | False | 5217.0 | 154864.0 | 2010-09-25 18:36:46 | others | others |
| 3937 | GRJR721 | True | True | False | False | 891.0 | 167792.0 | 2011-11-15 20:48:23 | others | others |
df_check['user_name'].nunique()
24
get_stats(df_check)
The value counts of the users with the most contributions: has_verified_email True 22 False 2 Name: has_verified_email, dtype: int64 The value counts of the users with the most contributions: is_mod True 18 False 6 Name: is_mod, dtype: int64 The value counts of the users with the most contributions: is_gold False 15 True 9 Name: is_gold, dtype: int64 The value counts of the users with the most contributions: is_banned False 17 True 7 Name: is_banned, dtype: int64 The min of comment_karma -1.0 The max of comment_karma 280938.0 The mean of comment_karma 23877.53 The median of comment_karma 23877.53 The min of link_karma 242.0 The max of link_karma 2838485.0 The mean of link_karma 529420.82 The median of link_karma 529420.82 The value counts of the users with the most contributions: banned_unverified others 15 banned 7 unverified 2 Name: banned_unverified, dtype: int64 The value counts of the users with the most contributions: creation_year others 13 banned 7 2018 4 Name: creation_year, dtype: int64
df['urls'].nunique()
113
df[df.astype(str)['urls'] != '[]'].head(2)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | t1_ds0x0jx | /r/elonmusk/comments/7n76bc/amber_heard_and_el... | Here's a sneak peek of /r/MGTOW using the [top... | t1_ds0x0d9 | r/elonmusk | sneakpeekbot | 2018-01-01 03:55:47 | Negative | Negative | 3.0 | comment | comment | 64 | amber_heard_and_elon_musk_spotted_vacationing_in | 8 | ['https://np.reddit.com', 'https://i.imgur.com... | 9 |
| 14 | t1_ds2okme | /r/elonmusk/comments/7n76bc/amber_heard_and_el... | Apparently common sense has escaped Elon Musk ... | t1_ds1q00a | r/elonmusk | -banned- | 2018-01-02 10:01:07 | Negative | Neutral | 3.0 | comment | comment | 62 | amber_heard_and_elon_musk_spotted_vacationing_in | 8 | ['http://docs.cpuc.ca.gov'] | 1 |
df['urls'].astype('str').value_counts().head()
[] 6761 ['https://t.co'] 22 ['https://www.reddit.com'] 21 ['https://youtu.be'] 10 ['https://www.youtube.com'] 9 Name: urls, dtype: int64
# the value counts of the # of urls
df['urls_count'].value_counts();
fig = px.histogram(df['urls_count'].to_frame(), x="urls_count",title='Count of the number of URLS in each Contribution',
nbins=130).update_traces(marker_color='#5296dd')
fig.update_layout(
xaxis = dict(
tickmode = 'array',
tickvals = df['urls_count'],
)
)
fig.show()
px.histogram(df[~(df['urls_count'].isin([0,60,20,16]))], x="urls_count",title='Count of the number of URLS in each Contribution',
nbins=20).update_traces(marker_color='#5296dd')
Check the number of submission text words
Of course few words are easier for bots to create
fig = px.histogram(df['submission_words'].to_frame(), x="submission_words",
title='number of words in submission text',
nbins=50).update_traces(marker_color='#5296dd')
fig.update_layout(
xaxis = dict(
title='Number of submission words',
tickmode = 'linear',
)
)
Most used Subreddits¶
df['subreddit'].nunique()
302
df['subreddit'] = df['subreddit'].str[:]
df.subreddit.value_counts().to_frame().head(20).reset_index()
| index | subreddit | |
|---|---|---|
| 0 | r/gentlemanboners | 950 |
| 1 | r/Celebs | 849 |
| 2 | r/DC_Cinematic | 760 |
| 3 | r/movies | 386 |
| 4 | r/WatchItForThePlot | 368 |
| 5 | r/JerkOffToCelebs | 285 |
| 6 | r/celebnsfw | 258 |
| 7 | r/MensRights | 255 |
| 8 | r/entertainment | 171 |
| 9 | r/goddesses | 145 |
| 10 | r/DCEUboners | 115 |
| 11 | r/Celebhub | 108 |
| 12 | r/news | 93 |
| 13 | r/celebritylegs | 79 |
| 14 | r/geekboners | 79 |
| 15 | r/GirlsMirin | 77 |
| 16 | r/celebJObuds | 75 |
| 17 | r/elonmusk | 67 |
| 18 | r/scifi | 63 |
| 19 | r/unpopularopinion | 52 |
fig = px.bar(df.subreddit.value_counts().to_frame().head(20).reset_index(), x="subreddit", y="index",
height=500,
title='Most used subbredits').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='Number of comments',
yaxis_title='subbredit').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
Merged Users Data with Comments & Submissions Data¶
Difference in time between creating the account and posting¶
# note that value_counts() neglect Zeros
df_merged["days_after_creation"].value_counts()
1582.0 104
2599.0 37
2430.0 36
2449.0 35
487.0 31
...
3422.0 1
1740.0 1
919.0 1
923.0 1
1174.0 1
Name: days_after_creation, Length: 2115, dtype: int64
px.histogram(df_merged, x="days_after_creation",title='days_after_creation',
nbins=250).update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='number of days',)
print('The number of accounts posted the same day they was created!')
df_merged[df_merged['days_after_creation'] == 0].shape[0]
The number of accounts posted the same day they was created!
24
print('The number of accounts posted the same week they was created!')
df_merged[df_merged['days_after_creation'] <= 7].shape[0]
The number of accounts posted the same week they was created!
62
print('The number of accounts posted the same month they was created!')
df_merged[df_merged['days_after_creation'] <= 30].shape[0]
The number of accounts posted the same month they was created!
171
df_merged[df_merged['days_after_creation'] <= 30]['user_created_at'].dt.year.value_counts()
2018 171 Name: user_created_at, dtype: int64
mask = (df_merged['days_after_creation'] <= 30) & (df_merged['user_created_at'].dt.year == 2018)
df_merged[mask]['user_created_at'].dt.strftime('%b').value_counts()
Jun 28 Nov 26 Dec 20 Jul 19 Oct 17 Feb 13 Mar 10 Apr 10 Aug 9 May 7 Sep 7 Jan 5 Name: user_created_at, dtype: int64
months = df_merged[df_merged['days_after_creation'] <= 30]['user_created_at'].dt.strftime('%b')
months_sorted = months.value_counts()[['Jan', 'Feb', 'Mar', 'Apr', 'May']]
months_sorted
Jan 5 Feb 13 Mar 10 Apr 10 May 7 Name: user_created_at, dtype: int64
fig = px.bar(months_sorted,
x=months_sorted.index, y=months_sorted.values, text=months_sorted.values)
fig.update_layout(
title={
'text': "contributions of the accounts posted/commented <br> the same month they were created",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
# fig.update_layout(
# xaxis = dict(
# title='Month(2021)',
# tickmode = 'array',
# tickvals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
# ticktext = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
# )
# )
clrs = ['red' if (y > 100) else '#5296dd' for y in months_sorted.values]
fig.update_traces(marker_color=clrs,
marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
# THE SAME MONTH:
# check for the date these accounts posted/commented
reddit_30 = df_merged[df_merged['days_after_creation'] <= 30]
dates_count = df_merged.groupby(reddit_30['created_at'].dt.date).size().reset_index(name='contributions')
dates_count.sort_values('contributions', ascending=False);
fig = px.bar(dates_count,
x='created_at',
y='contributions', title = 'contributions of the accounts posted/commented the same month they were created')
fig.update_traces(marker_color='#5296dd',
marker_line_width=1, opacity=1, textposition='auto').update_layout()
fig.show()
# THE SAME WEEK
# check for the date these accounts posted/commented
reddit_7 = df_merged[df_merged['days_after_creation'] <= 7]
dates_count_7 = df_merged.groupby(reddit_7['created_at'].dt.date).size().reset_index(name='contributions')
dates_count_7.sort_values('contributions', ascending=False);
fig = px.bar(dates_count_7,
x='created_at',
y='contributions', title = 'contributions of the accounts posted/commented the same week they were created')
fig.update_traces(marker_color='#5296dd',
marker_line_width=1, opacity=1, textposition='auto').update_layout()
fig.show()
# THE SAME DAY
# check for the date these accounts posted/commented
reddit_1 = df_merged[df_merged['days_after_creation'] <= 0]
dates_count_1 = df_merged.groupby(reddit_1['created_at'].dt.date).size().reset_index(name='contributions')
dates_count_1.sort_values('contributions', ascending=False);
fig = px.bar(dates_count_1,
x='created_at',
y='contributions', title = 'contributions of the accounts posted/commented the same day they were created')
fig.update_traces(marker_color='#5296dd',
marker_line_width=.5, opacity=1, textposition='auto').update_layout()
fig.show()
# get the author names that commented in a negative way the same month the account was created
# to add to the suspected list
df_merged_30 = df_merged.query("days_after_creation <= 30 & sentiment_blob == sentiment_nltk == 'Negative' ")
df_merged_30.head()
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3144 | t1_dyc2my4 | /r/celebJObuds/comments/8gdrmz/any_buds_want_t... | I'd fuck the crazy right out of her | t3_8gdrmz | r/celebJObuds | 2018-05-02 17:32:46 | Negative | Negative | 2.0 | submission | ... | False | False | False | 1022.0 | 2649.0 | 2018-04-04 04:06:28 | others | 2018 | 28 days 13:26:18 | 28.0 |
| 3262 | t1_dytrknx | /r/premed/comments/8iiisc/scrolling_through_my... | That makes me so mad! Damn you, Caduceus! | t1_dys31hq | r/premed | 2018-05-11 19:17:10 | Negative | Negative | 1.0 | comment | ... | False | False | False | 8516.0 | 2057.0 | 2018-04-27 18:57:47 | others | 2018 | 14 days 00:19:23 | 14.0 |
| 4394 | t1_e2ulws0 | /r/movies/comments/910nqx/just_a_friendly_remi... | OP really picked the wrong example to prove th... | t1_e2ujdpj | r/movies | 2018-07-22 20:58:35 | Negative | Negative | 24.0 | comment | ... | False | False | False | 34616.0 | 42684.0 | 2018-07-15 17:03:43 | unverified | 2018 | 7 days 03:54:52 | 7.0 |
| 4545 | t1_e2zfi6i | /r/JerkOffToCelebs/comments/91mf8f/which_redhe... | ScarJo i mean dat ass | t3_91mf8f | r/JerkOffToCelebs | 2018-07-25 02:22:00 | Negative | Negative | 11.0 | submission | ... | False | False | False | 309.0 | 1.0 | 2018-07-21 15:43:24 | unverified | 2018 | 3 days 10:38:36 | 3.0 |
| 5474 | t1_e8ffubo | /r/CelebAssPussyMouth/comments/9r48bq/another_... | G and K all day long.\n\nIt was a difficult ch... | t3_9r48bq | r/CelebAssPussyMouth | 2018-10-25 14:56:22 | Negative | Negative | 4.0 | submission | ... | False | False | False | 684.0 | 600.0 | 2018-10-01 12:42:45 | others | 2018 | 24 days 02:13:37 | 24.0 |
5 rows × 24 columns
Estimation of Number of User Accounts Created in each year / having contributions in 2021¶
# group by creation year and count
df_contributions = df_merged.groupby(df_merged['user_created_at'].dt.year).size().reset_index(name='n_accounts')
fig = px.bar(df_contributions,
x='user_created_at', y='n_accounts', text='n_accounts', title='Number of User Accounts Created in each year / having contributions in 2018')
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
Contributions count over Months in 2018¶
fig = px.bar(df.groupby(df['created_at'].dt.month).size().reset_index(name='contribution_count'),
x='created_at', y='contribution_count', text='contribution_count')
fig.update_layout(
title={
'text': "Estimation of the number contributions created in each month of 2018",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_layout(
xaxis = dict(
title='Month(2018)',
tickmode = 'array',
tickvals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
ticktext = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
)
)
clrs = ['red' if (y > 4000) else '#5296dd' for y in df.groupby(df['created_at'].dt.month).size()]
fig.update_traces(marker_color=clrs,
marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
Contributions count over Days of month in 2018¶
fig = px.bar(df.groupby(df['created_at'].dt.day).size().reset_index(name='contribution_count'),
x='created_at', y='contribution_count', text='contribution_count')
fig.update_layout(
title={
'text': "Estimation of the number contributions created in each DayOfMonth in 2018",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_layout(
xaxis = dict(
title='Month Days(2018)',
tickmode = 'linear',
)
)
clrs = ['red' if (y > 1500) else '#5296dd' for y in df.groupby(df['created_at'].dt.day).size()]
fig.update_traces(marker_color=clrs,
marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
In Which DayOfWeek users created more?¶
week_day = df['created_at'].dt.strftime('%a')
# one can sort by any order by providing a custom index explicitely :
# https://stackoverflow.com/questions/43855474/changing-sort-in-value-counts/43855492
week_sorted = week_day.value_counts()[['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']]
week_sorted
Mon 811 Tue 1111 Wed 1406 Thu 1185 Fri 865 Sat 741 Sun 874 Name: created_at, dtype: int64
fig = px.bar(df.groupby(df['created_at'].dt.dayofweek).size().reset_index(name='contribution_count'),
x='created_at', y='contribution_count', text='contribution_count')
fig.update_layout(
title={
'text': "Estimation of the number contributions created in each DayOfWeek (2018)",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_layout(
xaxis = dict(
title='DayOfWeek(2018)',
tickmode = 'array',
tickvals = [0, 1, 2, 3, 4, 5, 6],
ticktext = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
)
)
clrs = ['red' if (y > 4000) else '#5296dd' for y in df.groupby(df['created_at'].dt.dayofweek).size()]
fig.update_traces(marker_color=clrs,
marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
check for the hour the contributions were made (2018)¶
# check for the hour the contributions were made
df_hours = df.groupby(df['created_at'].dt.hour).size().reset_index(name='contribution_count')
# df_hours.sort_values('contribution_count', ascending=False);
fig = px.bar(df_hours,
x='created_at', y='contribution_count',
title='Number of contrbutions Comment/Submission in day hours (2018)')
fig.update_layout(
xaxis = dict(
title='Hours of Day',
tickmode = 'linear',
dtick = 1
)
)
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
It's weird to have high contributions all the day!!
Which dates has the highest contrbitions for users?¶
df.created_at.dt.date.value_counts().head()
2018-12-19 244 2018-07-03 190 2018-12-20 150 2018-07-22 137 2018-07-24 128 Name: created_at, dtype: int64
trendy_dates = df.groupby(df['created_at'].dt.date).size().reset_index(name='contribution_count')
fig = px.bar(trendy_dates,
x='created_at', y='contribution_count')
fig.update_layout(
title={
'text': "The number of contributions created in each date",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_traces(marker_color='#5296dd',
marker_line_width=0.5, opacity=1, textposition='auto').update_layout()
fig.show()
trendy_dates.sort_values('contribution_count', ascending=False)
| created_at | contribution_count | |
|---|---|---|
| 339 | 2018-12-19 | 244 |
| 173 | 2018-07-03 | 190 |
| 340 | 2018-12-20 | 150 |
| 192 | 2018-07-22 | 137 |
| 194 | 2018-07-24 | 128 |
| ... | ... | ... |
| 50 | 2018-02-26 | 1 |
| 87 | 2018-04-04 | 1 |
| 266 | 2018-10-07 | 1 |
| 43 | 2018-02-19 | 1 |
| 116 | 2018-05-04 | 1 |
352 rows × 2 columns
# get the top 5 trendy dates first, then sort them by date
top_trendy_dates = trendy_dates.sort_values('contribution_count', ascending=False).head(5)
top_trendy_dates.sort_values('created_at', inplace=True)
top_trendy_dates
| created_at | contribution_count | |
|---|---|---|
| 173 | 2018-07-03 | 190 |
| 192 | 2018-07-22 | 137 |
| 194 | 2018-07-24 | 128 |
| 339 | 2018-12-19 | 244 |
| 340 | 2018-12-20 | 150 |
top_trendy_dates.reset_index(inplace=True)
fig = px.bar(top_trendy_dates,
x='created_at', y='contribution_count', title='Number of contrbutions Comment/Submission in trendy dates')
fig.update_layout(
xaxis = dict(
title='Contribution Date',
tickmode = 'array',
tickvals = top_trendy_dates.created_at,
)
)
clrs = ['red' if (y > 400) else '#5296dd' for y in top_trendy_dates.contribution_count]
fig.update_traces(marker_color=clrs,
opacity=1, textposition='auto').update_layout()
# marker_line_width=1.5,
fig.show()
def peack_days(date):
print(f'How many users contributed on the peak day ({date})')
print(df_merged[df_merged.created_at.dt.strftime('%Y-%m-%d') == date].user_name.nunique())
print('Years')
df_merged_peak1 = df_merged[df_merged.created_at.dt.strftime('%Y-%m-%d') == date]
# check for the year the accounts were created
df_user_year1 = df_merged_peak1.groupby(df_merged_peak1['user_created_at'].dt.year).size().reset_index(name='contribution_count')
fig = px.bar(df_user_year1,
x='user_created_at', y='contribution_count',
title=f'The creation year of the accounts contributed on the peak day ({date})')
fig.update_layout(
xaxis = dict(
title='Accout Creation Year',
tickmode = 'linear',
dtick = 1
)
)
clrs = ['red' if (y > 250) else '#5296dd' for y in df_user_year1.contribution_count]
fig.update_traces(marker_color=clrs,
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
print('hours')
df_peak_1 = df[df.created_at.dt.strftime('%Y-%m-%d') == f'{date}']
# check for the hour the contributions were made
df_hours = df_peak_1.groupby(df['created_at'].dt.hour).size().reset_index(name='contribution_count')
# df_hours.sort_values('contribution_count', ascending=False);
fig2 = px.bar(df_hours,
x='created_at', y='contribution_count',
title='Number of contrbutions Comment/Submission in day hours')
fig2.update_layout(
xaxis = dict(
title='Hours of Day',
tickmode = 'linear',
dtick = 1
)
)
clrs = ['red' if (y > 80) else '#5296dd' for y in df_hours.contribution_count]
fig2.update_traces(marker_color=clrs,
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig2.show()
peack_days('2018-07-03')
How many users contributed on the peak day (2018-07-03) 98 Years
hours
peack_days('2018-07-22')
How many users contributed on the peak day (2018-07-22) 50 Years
hours
peack_days('2018-07-24')
How many users contributed on the peak day (2018-07-24) 79 Years
hours
peack_days('2018-12-19')
How many users contributed on the peak day (2018-12-19) 96 Years
hours
peack_days('2018-12-20')
How many users contributed on the peak day (2018-12-20) 94 Years
hours